NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity

Published: July 31, 2020, 10:27 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.30.227611v1?rss=1 Authors: Barot, M., Gligorijevic, V., Cho, K., Bonneau, R. Abstract: Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to proteome and biological network functional annotation use sequence similarity to transfer knowledge between species. These similarity-based approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular or organismal context for meaningful function prediction. In order to supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, the majority of these methods are tied to a network for a single species, and many species lack biological networks. In this work, we integrate sequence and network information across multiple species by applying an IsoRank-derived network alignment algorithm to create a meta-network profile of the proteins of multiple species. We then use this integrated multispecies meta-network as input features to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and more diverse examples from multiple organisms, and consequently leads to significant improvements in function prediction performance. Further, we evaluate our approach in a setting in which an organism's PPI network is left out, using other organisms' network information and sequence homology in order to make predictions for the left-out organism, to simulate cases in which a newly sequenced species has no network information available. Copy rights belong to original authors. Visit the link for more info