Varlock: privacy preserving storage and dissemination of sequenced genomic data

Published: Sept. 16, 2020, 4:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.16.299594v1?rss=1 Authors: Hekel, R., Budis, J., Kucharik, M., Radvanszky, J., Szemes, T. Abstract: Introduction: Current and future applications of genomic data may raise ethical and privacy concerns. Processing and storing genomic data introduces a risk of abuse by a potential adversary since the human genome contains information about sensitive personal traits. For this reason, we developed a privacy preserving method, called Varlock, for secure storage and dissemination of sequenced genomic data. Materials and methods: The Varlock uses a set of population allele frequencies to mask personal alleles detected in genomic reads. Each detected allele is replaced by a randomly selected population allele concerning its frequency. Masked alleles are preserved in an encrypted confidential file that can be shared, in whole or in part, using public-key cryptography. Results: Our method masked personal variants and introduced new variants called on an individual's genome, while alternative alleles with lower population frequency were masked and introduced more often. We performed joint PCA analysis of personal and masked VCFs, showing that the VCFs between the two groups can not be trivially mapped. Moreover, the method is reversible; therefore, personal alleles can be unmasked in specific genomic regions on demand. Conclusion: Our method masks personal alleles within mapped reads while preserving valuable non-sensitive properties of sequenced DNA fragments for further research. Accordingly, masked reads can be stored publicly, since they are deprived of sensitive personal information. Personal alleles may be restored in arbitrary genomic regions for interested parties: patients, medical units, and researchers. Keywords: genome, privacy, personal data Copy rights belong to original authors. Visit the link for more info