RNA secondary structure prediction using deep learning with thermodynamic integrations

Published: Aug. 11, 2020, 2:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.10.244442v1?rss=1 Authors: Sato, K., Akiyama, M., Sakakibara, Y. Abstract: RNA secondary structure prediction is one of the key technologies for unveiling the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such a model has been reported. We propose a new algorithm for predicting RNA secondary structures using deep learning with thermodynamic integrations, which enable us robust predictions. Our algorithm computes folding scores for the nearest neighbor loops using a deep neural network. Similar to our previous work, the folding scores are integrated with the traditional thermodynamic parameters to enable robust predictions. Then, the Zuker-style dynamic programming is employed to find an optimal secondary structure that maximizes the folding score. We also propose a new regularization, called the thermodynamic regularization, to train our deep neural network with avoiding the overfit to training data. Our algorithm (MXfold2) archived the most robust and accurate predictions in the computational experiments assuming the discovery of new non-coding RNAs. The source code of MXfold2 and the datasets used in the experiments are available at https://github.com/keio-bioinformatics/mxfold2/. Copy rights belong to original authors. Visit the link for more info