A Tree of Human Gut Bacterial Species and its Applications to Metagenomics and Metaproteomics Data Analysis

Published: Sept. 25, 2020, 11:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.24.311720v1?rss=1 Authors: Stamboulian, M., Doak, T. G., Ye, Y. Abstract: Background: Recent advances in genome and metagenome sequencing have dramatically enriched the collection of genomes of bacterial species related to human health and diseases. In metagenomic studies phylogenetic trees are commonly used to depict, describe, and compare the bacterial members of the community under study. The most accurate tree-building algorithms now use large sets of marker genes taken from across genomes. However, many of the current bacterial genomes were assembled from metagenomic datasets (i.e., metagenome assembled genomes, MAGs), and often contain missing information. It is therefore important to study how well the phylogeny approach performs on such genomes. Further, phylogeny methods are not perfect and it is important to know how reliable an inferred tree is. Results: Here we examined the impact of incompleteness of the genomes on the tree reconstruction, and we showed that phylogeny approaches including RAxML (which handles missing data explicitly) and FastTree generally performed well on simulated collection of 400 genomes with missing information. As RAxML is computationally prohibitive for the much larger collections of gut genomes, we chose FastTree to build a unified tree of human-gut associated bacterial species (referred to as gut tree), including more than 3000 genomes, most of which are incomplete. We developed two downstream applications of the gut tree: peptide-centric analysis of metaproteomics datasets; and taxonomic characterization of metagenomic sequences. In both applications, the gut tree provided the basis for quantification of species composition at various taxonomic resolutions. Conclusions: The gut tree presented in this study provides a useful framework for taxonomic profiling of human gut microbiome. Including MAGs in the tree provides more comprehensive representation of microbial species diversity associated with human gut, important for studying the taxonomic composition of gut microbiome. Availability and Implementation: The tree construction pipeline and downstream applications of the gut tree are freely available at https://github.com/mgtools/guttree. Copy rights belong to original authors. Visit the link for more info