Predicting drug resistance in M. tuberculosis using a Long-term Recurrent Convolutional Networks architecture

Published: Nov. 8, 2020, 4:04 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.07.372136v1?rss=1 Authors: Safari, A. H., Sedaghat, N., Forna, A., Zabeti, H., Chindelevitch, L., Libbrecht, M. Abstract: Drug resistance in Mycobacterium tuberculosis (MTB) may soon be a leading worldwide cause of death. One way to mitigate the risk of drug resistance is through methods that predict drug resistance in MTB using whole-genome sequencing (WGS) data. Existing machine learning methods for this task featurize the WGS data from a given bacterial isolate by defining one input feature per SNP. Here, we introduce a gene-centric method for predicting drug resistance in TB. We define one feature per gene according to the number of mutations in that gene in a given isolate. This representation greatly decreases the number of model parameters. We further propose a model that considers both gene order through a Long-term Recurrent Convolutional Network (LRCN) architecture, which combines convolutional and recurrent layers. We find that using these strategies yields a substantial, statistically-significant improvement over the state-of-the-art and that this improvement is driven by the order of genes in the genome and their organization into operons. Copy rights belong to original authors. Visit the link for more info