Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences

Published: Nov. 5, 2020, 7:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.03.367367v1?rss=1 Authors: Eslami Rasekh, M., Hernandez, Y., Drinan, S. D., Fuxman Bass, J. I., Benson, G. Abstract: Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole-genome sequencing datasets from 2,770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ranging from 7 bp to 126 bp, and with array lengths up to 230 bp. We detected 35,638 VNTR loci and classified 5,676 as common (occurring in >5% of the population). Common VNTR loci were found to be enriched in genomic regions with regulatory function, i.e., transcription start sites and enhancers. Investigation of the common VNTRs in the context of population ancestry revealed that 1,096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near perfect accuracy. Comparison of genotyping results with proximal genes indicated that in 120 cases (118 genes), expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellites VNTRs in the human population to date. Copy rights belong to original authors. Visit the link for more info