Nucleotide-resolution bacterial pan-genomics with reference graphs

Published: Nov. 12, 2020, 3:01 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.12.380378v1?rss=1 Authors: Colquhoun, R. M., Hall, M. B., Lima, L., Roberts, L. W., Malone, K. M., Hunt, M., Letcher, B., Hawkey, J., George, S., Pankhurst, L., Iqbal, Z. Abstract: Bacterial genomes follow a U-shaped frequency distribution whereby most genomic loci are either rare (accessory) or common (core) - the alignable fraction of two genomes from a single species might be only 50%. Standard tools therefore analyse mutations only in the core genome, ignoring accessory mutations. We present a novel pan-genome graph structure and algorithms implemented in the software pandora, which approximates a sequenced genome as a recombinant of reference genomes, detects novel variation and then pan-genotypes multiple samples. Constructing a reference graph from 578 E. coli genomes, we analyse a diverse set of 20 E. coli isolates. We show, for rare variants, pandora recovers at least 13k more SNPs than single-reference based tools, achieving equal or better error rates with Nanopore as with Illumina data, and providing a stable framework for analysing diverse samples without reference bias. This is a significant step towards comprehensive analyses of bacterial genetic variation. Copy rights belong to original authors. Visit the link for more info