DHS-Crystallize: Deep-Hybrid-Sequence based method for predicting protein Crystallization

Published: Nov. 13, 2020, 1:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.13.381301v1?rss=1 Authors: Alavi, A., Ascher, D. C. Abstract: The key method for determining the structure of a protein to date is X-ray crystallography, which is a very expensive technique that suffers from high attrition rate. On the contrary, a sequence-based predictor that is capable of accurately determining protein crystallization property, would not only overcome such limitations, but also would reduce the trial-and-error settings required to perform crystallization. In this work, to predict protein crystallizability, we have developed a novel sequence-based hybrid method that employs two separate, yet fully automated, concepts for extracting features from protein sequences. Specifically, we use a deep convolutional neural network on a publicly available dataset to extract descriptive features directly from the sequences, then fuse such feature with structural-and-physio-chemical driven features (such as amino-acid composition or AAIndex-based physicochemical properties). Dimentionality reduction is then performed on the resulting features and the output vectors are applied to train optimized gradient boosting machine (XGBoostt). We evaluate our method through three publicly available test sets, and show that our proposed DHS-Crystallize algorithm outperforms state-of-the- art methods, and achieves higher performance compared to using DCNN-deriven features, or structural-and-physio-chemical driven features alone. Copy rights belong to original authors. Visit the link for more info