RaFAH: A superior method for virus-host prediction

Published: Sept. 27, 2020, 6:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.25.313155v1?rss=1 Authors: Hernandes Coutinho, F., Zaragosa-Solas, A., Lopez-Perez, M., Barylski, J., Zielezinski, A., Dutilh, B. E., Edwards, R., Rodriguez-Valera, F. Abstract: Viruses of prokaryotes are extremely abundant and diverse. Culture-independent approaches have recently shed light on the biodiversity these biological entities. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), based on the analysis of nearly 200,000 viral genomes. RaFAH outperformed other methods for virus-host prediction (F1-score = 0.97 at the level of phylum). RaFAH was applied to diverse datasets encompassing genomes of uncultured viruses derived from eight different biomes of medical, biotechnological, and environmental relevance, and was capable of accurately describing these viromes. This led to the discovery of 537 genomic sequences of archaeal viruses. These viruses represent previously unknown lineages and their genomes encode novel auxiliary metabolic genes, which shed light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/. Copy rights belong to original authors. Visit the link for more info