Protein Structural Alignments From Sequence

Published: Nov. 4, 2020, 8:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.03.365932v1?rss=1 Authors: Morton, J., Strauss, C., Blackwell, R., Berenberg, D., Gligorijevic, V., Bonneau, R. Abstract: Computing sequence similarity is a fundamental task in biology, with alignment forming the basis for the annotation of genes and genomes and providing the core data structures for evolutionary analysis. Standard approaches are a mainstay of modern molecular biology and rely on variations of edit distance to obtain explicit alignments between pairs of biological sequences. However, sequence alignment algorithms struggle with remote homology tasks and cannot identify similarities between many pairs of proteins with similar structures and likely homology. Recent work suggests that using machine learning language models can improve remote homology detection. To this end, we introduce DeepBLAST, that obtains explicit alignments from residue embeddings learned from a protein language model integrated into an end-to-end differentiable alignment framework. This approach can be accelerated on the GPU architectures and outperforms conventional sequence alignment techniques in terms of both speed and accuracy when identifying structurally similar proteins. Copy rights belong to original authors. Visit the link for more info