Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences

Published: Oct. 8, 2020, 1:01 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.08.330985v1?rss=1 Authors: Seiler, E., Mehringer, S., Darvish, M., Turc, E., Reinert, K. Abstract: We present Raptor, a tool for approximately searching many queries in large collections of nucleotide sequences. In comparison with similar tools like Mantis and COBS, Raptor is 12 - 144 times faster and uses up to 30 times less memory. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and a partitioning of the IBF to enable the effective use of secondary memory. Copy rights belong to original authors. Visit the link for more info