DeepCSO: a deep-learning network approach to predicting Cysteine S-sulphenylation sites

Published: Aug. 13, 2020, 10:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.12.248914v1?rss=1 Authors: LYu, X., Zou, Y., Li, L. Abstract: Cysteine S-sulphenylation (CSO), as a novel post-translational modification (PTM), has emerged as a potential mechanism to regulate protein functions and affect signal networks. Because of its functional significance, several prediction approaches have been developed. Nevertheless, they are based on a limited dataset from Homo sapiens and there is a lack of prediction tools for the CSO sites of other species. Recently, this modification has been investigated at the proteomics scale for a few species and the number of identified CSO sites has significantly increased. Thus, it is essential to explore the characteristics of this modification across different species and construct prediction models with better performances based on the enlarged dataset. In this study, we constructed a few classifiers and fond that the long short-term memory model with the word-embedding encoding approach, dubbed LSTMWE, performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve values for LSTMWE ranged from 0.82 to 0.85 for different organisms, which is superior to the reposted CSO predictors. Moreover, we developed the general model based on the integrated data from different species and it showed great universality and effectiveness. We provided the on-line prediction service called DeepCSO that included both species-specific and general models, which is accessible through http://www.bioinfogo.org/DeepCSO. Copy rights belong to original authors. Visit the link for more info