Proximity Measures as Graph Convolution Matrices for Link Prediction in Biological Networks

Published: Nov. 16, 2020, 1:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.14.382655v1?rss=1 Authors: Coskun, M., Koyuturk, M. Abstract: Motivation: Link prediction is an important and well-studied problem in computational biology, with a broad range of applications including disease gene prioritization, drug disease associations, and drug response in cancer. The general principle in link prediction is to use the topological characteristics and the attributes--if available-- of the nodes in the network to predict new links that are likely to emerge/disappear. Recently, graph representation learning methods, which aim to learn a low-dimensional representation of topological characteristics and the attributes of the nodes, have drawn increasing attention to solve the link prediction problem via learnt low-dimensional features. Most prominently, Graph Convolution Network (GCN)-based network embedding methods have demonstrated great promise in link prediction due to their ability of capturing non-linear information of the network. To date, GCN-based network embedding algorithms utilize a Laplacian matrix in their convolution layers as the convolution matrix and the effect of the convolution matrix on algorithm performance has not been comprehensively characterized in the context of link prediction in biomedical networks. On the other hand, for a variety of biomedical link prediction tasks, traditional node similarity measures such as Common Neighbor, Ademic-Adar, and other have shown promising results, and hence there is a need to systematically evaluate the node similarity measures as convolution matrices in terms of their usability and potential to further the state-of-the-art. Results: We select 8 representative node similarity measures as convolution matrices within the single-layered GCN graph embedding method and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction. Our experimental results demonstrate that the node similarity-based convolution matrices significantly improves GCN-based embedding algorithms and deserve more attention in the future biomedical link prediction Availability: Our method is implemented as a python library and is available at githublink Copy rights belong to original authors. Visit the link for more info