Using Single ProteinLigand Binding Models to Predict Active Ligands for Unseen Proteins

Published: Aug. 3, 2020, 6:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.02.233155v1?rss=1 Authors: Sundar, V., Colwell, L. Abstract: Machine learning models that predict which small molecule ligands bind a single protein target report high levels of accuracy for held-out test data. An important challenge is to extrapolate and make accurate predictions for new protein targets. Improvements in drug-target interaction (DTI) models that address this challenge would have significant impact on drug discovery by eliminating the need for high-throughput screening experiments against new protein targets. Here we propose a data augmentation strategy that addresses this challenge to enable accurate prediction in cases where no experimental data is available. To proceed, we first build single protein-ligand binding models and use these models to predict whether additional ligands bind to each protein. We then use these predictions to augment the experimental data, train standard DTI models, and predict interactions between unseen test proteins and ligands. This approach achieves Area Under the Receiver Operator Characteristic (AUC) > 0.9 consistently on test sets consisting exclusively of proteins and ligands for which the model is given no experimental data. We verify that performance improvements extend to held-out test proteins distant from the training set. Our data augmentation framework can be applied to any DTI model, and enhances performance on a range of simple models. Copy rights belong to original authors. Visit the link for more info