Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences

Published: Oct. 1, 2020, 10:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.29.319095v1?rss=1 Authors: Edgar, R. C. Abstract: Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. Syncmers are an alternative method for selecting a subset of k-mers. Unlike a minimizer, a syncmer is identified by its k-mer sequence alone and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Bounded syncmers are defined by a small and fast function of the k-mer sequence which exploits correlations between overlapping k-mers to guarantee that at least one syncmer must appear in a window of predetermined length, and therefore comprise a universal hitting set which does not require a precomputed lookup table. Bounded syncmers are shown to be unambiguously superior to minimizers because they achieve both lower density and better conservation in mutated sequences. Copy rights belong to original authors. Visit the link for more info