METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

Published: Oct. 19, 2020, 11:02 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.18.344697v1?rss=1 Authors: Zhang, Z., Zhang, L. Abstract: Due to the complexity of metagenomic community, de novo assembly on next generation sequencing data is commonly unable to produce microbial complete genomes. Metagenomic binning is a crucial task that could group the fragmented contigs into clusters based on their nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Assembly and paired-end graphs can provide the connectedness between contigs, where the linked contigs have high chance to be derived from the same clusters. Results: We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and paired-end graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends subgraphs. METAMVGL could learn the graphs' weights automatically and predict the contig labels in a uniform multi-view label propagation framework. In the experiments, we observed METAMVGL significantly increased the high-confident edges in the combined graph and linked dead ends to the main graph. It also outperformed with many state-of-the-art binning methods, MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and Graphbin on the metagenomic sequencing from simulation, two mock communities and real Sharon data. Availability and implementation: The software is available at https://github.com/ZhangZhenmiao/METAMVGL. Copy rights belong to original authors. Visit the link for more info