New targets acquired: improving locus recovery from the Angiosperms353 probe set

Published: Oct. 5, 2020, 12:03 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.04.325571v1?rss=1 Authors: McLay, T. G., Gunn, B. F., Ning, W., Tate, J. A., Nauheimer, L., Joyce, E. M., Simpson, L., Schmidt-Lebuhn, A. N., Baker, W. J., Forest, F., Jackson, C. J. Abstract: Universal target enrichment kits maximise utility across wide evolutionary breadth while minimising the number of baits required to create a cost-efficient kit. Locus assembly requires a target reference, but the taxonomic breadth of the kit means that target references files can be phylogenetically sparse. The Angiosperms353 kit has been successfully used to capture loci throughout angiosperms but includes sequence information from 6-18 taxa per locus. Consequently, reads sequenced from on-target DNA molecules may fail to map to references, resulting in fewer on-target reads for assembly, reducing locus recovery. We expanded the Angiosperms353 target file, incorporating sequences from 566 transcriptomes to produce a mega353 target file, with each gene represented by 17-373 taxa. This mega353 file is a drop-in replacement for the original Angiosperms353 file in HybPiper analyses. We provide tools to subsample the file based on user-selected taxon groups, and to incorporate other transcriptome or protein-coding gene datasets. Compared to the default Angiosperms353 file, the mega353 file increased the percentage of on-target reads by an average of 31%, increased loci recovery at 75% length by 61.9%, and increased the total length of the concatenated loci by 30%. The mega353 file and associated scripts are available at: https://github.com/chrisjackson-pellicle/NewTargets Copy rights belong to original authors. Visit the link for more info