StartLink+: Prediction of Gene Starts in Prokaryotic Genomes by an Algorithm Integrating Independent Sources of Evidence

Published: Oct. 26, 2020, 8:03 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.25.352625v1?rss=1 Authors: Gemayel, K., Lomsadze, A., Borodovsky, M. Abstract: Algorithms of ab initio gene finding were shown to make sufficiently accurate predictions in prokaryotic genomes. Nonetheless, for up to 15-25% of genes per genome the gene start predictions might differ even when made by the supposedly most accurate tools. To address this discrepancy, we have introduced StartLink+, an approach combining ab initio and multiple sequence alignment based methods. StartLink+ makes predictions for a majority of genes per genome (73% on average); in tests on sets of genes with experimentally verified starts the StartLink+ accuracy was shown to be 98-99%. When StartLink+ predictions made for a large set of prokaryotic genomes were compared with the database annotations we observed that on average the gene start annotations deviated from the predictions for ~5% of genes in AT-rich genomes and for 10-15% of genes in GC-rich genomes. Copy rights belong to original authors. Visit the link for more info