Neural networks to learn protein sequence-function relationships from deep mutational scanning data

Published: Oct. 25, 2020, 6:02 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.25.353946v1?rss=1 Authors: Gelman, S., Romero, P. A., Gitter, A. Abstract: The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Our software is available from https://github.com/gitter-lab/nn4dms. Copy rights belong to original authors. Visit the link for more info