RFPlasmid: Predicting plasmid sequences from short read assembly data using machine learning

Published: Aug. 2, 2020, 7:01 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.230631v1?rss=1 Authors: Graaf, L. v. d., Wagenaar, J. A., Zomer, A. L. Abstract: Antimicrobial resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know if the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like k-mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including k-mer composition and databases with plasmid and chromosomal marker proteins, to predict if the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial species, including Campylobacter, E. coli, and Salmonella, and has a species agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as standalone tool and via web interface. Copy rights belong to original authors. Visit the link for more info