Variant pathogenic prediction by locus variability, the importance of the last picture of evolution.

Published: Nov. 8, 2020, 7:03 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.06.371195v1?rss=1 Authors: Cabrera, J. L., Enriquez, J. A., Garcia, J., Sanchez-Cabo, F. Abstract: Accurate pathogenic detection for single nucleotide variants (SNVs) is a key problem to perform variant ranking in whole exome sequencing studies. Several in silico tools have been developed to identify deleterious variants. Locus variability, computed as Shannon entropy from gnomAD/helixMTdb variant allele frequencies can be used as pathogenic variants predictor. In this study we evaluate the use of Shannon entropy in non-coding mitochondrial DNA and also in coding regions with an additional selective pressure other than that imposed by the genetic code, as are splice-sites. To benchmark this functionality in non-coding mitochondrial variants, Shannon entropy was compared with HmtVar disease score, outperforming it in non-coding SNVs (AUCH=0.99 in ROC curve and PR-AUCH=1.00 in Precision-recall curve). In the same way, for splice-sites' variants, Shannon entropy was compared against two state-of-the-art ensemble predictors ada score and rf score, matching their overall performance both in ROC curves (AUCH=0.95) and Precision-recall curves (PR-AUC=0.97). Therefore, locus variability could aid in variant ranking process for these specific types of SNVs. Copy rights belong to original authors. Visit the link for more info