Ultra-fast Prediction of Somatic Structural Variations by Reduced Read Mapping via Pan-Genome k-mer Sets

Published: Oct. 26, 2020, 7:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.25.354456v1?rss=1 Authors: Choi, M.-H., Sohn, J.-i., Yi, D., Menon, A. V., Kim, Y. J., Kyoung, S., Shin, S.-H., Na, B., Joung, J.-G., Yoon, S., Koh, Y., Baek, D., Kim, T.-M., Nam, J.-W. Abstract: Genome rearrangements often result in copy number alterations of cancer-related genes and cause the formation of cancer-related fusion genes. Current structural variation (SV) callers, however, still produce massive numbers of false positives (FPs) and require high computational costs. Here, we introduce an ultra-fast and high-performing somatic SV detector, called ETCHING, that significantly reduces the mapping cost by filtering reads matched to pan-genome and normal k-mer sets. To reduce the number of FPs, ETCHING takes advantage of a Random Forest classifier that utilizes six breakend-related features. We systematically benchmarked ETCHING with other SV callers on reference SV materials, validated SV biomarkers, tumor and matched-normal whole genomes, and tumor-only targeted sequencing datasets. For all datasets, our SV caller was much faster ([≥]15X) than other tools without compromising performance or memory use. Our approach would provide not only the fastest method for largescale genome projects but also an accurate clinically practical means for real-time precision medicine. Copy rights belong to original authors. Visit the link for more info