Ontology-Aware Deep Learning Enables Ultrafast, Accurate and Interpretable Source Tracking among Sub-Million Microbial Community Samples from Hundreds of Niches

Published: Nov. 2, 2020, 2:03 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.01.364208v1?rss=1 Authors: Ning, K., Zha, Y., Chong, H., Qiu, H., Kang, K., Dun, Y., Chen, Z., Cui, X. Abstract: The taxonomical structure of microbial community sample is highly habitat-specific, making it possible for source tracking niches where samples are originated. Current methods face challenges when the number of samples and niches are magnitudes more than current in use, under which circumstances they are unable to accurately source track samples in a timely manner, rendering them difficult in knowledge discovery from sub-million heterogeneous samples. Here, we introduce a deep learning method based on Ontology-aware Neural Network approach, ONN4MST (https://github.com/HUST-NingKang-Lab/ONN4MST), which takes into consideration the ontology structure of niches and the relationship of samples from these ontologically-organized niches. ONN4MST's superiority in accuracy, speed and robustness have been proven, for example with an accuracy of 0.99 and AUC of 0.97 in a microbial source tracking experiment that 125,823 samples and 114 niches were involved. Moreover, ONN4MST has been utilized on several source tracking applications, showing that it could provide highly-interpretable results from samples with previously less-studied niches, detect microbial contaminants, and identify similar samples from ontologically-remote niches, with high fidelity. Copy rights belong to original authors. Visit the link for more info