HTSeqQC: A Flexible and One-Step Quality Control Software for High-throughput Sequence Data Analysis

Published: July 24, 2020, 9 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.23.214536v1?rss=1 Authors: Bedre, R. H., Avila, C. A., Mandadi, K. Abstract: Motivation: Use of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly. Results: Here, we developed HTSeqQC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSeqQC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSeqQC for conducting batch analysis of HTS datasets with 322 sample datasets with an average ~ 1M (paired end) sequence reads per sample. HTSeqQC accomplished the QC analysis in ~3 hours in distributed mode and ~31 hours in shared mode, thus underscoring its utility and robust performance. Availability and implementation: HTSeqQC software, Docker image and Nextflow template are available for download at https://github.com/reneshbedre/HTSeqQC and graphical user interface (GUI) is available at CyVerse Discovery Environment (DE) (https://cyverse.org/ ). Documentation available at https://reneshbedre.github.io/blog/htseqqc.html and https://cyverse-htseqqc-cyverse-tutorial.readthedocs-hosted.com/en/latest/ (for CyVerse). Copy rights belong to original authors. Visit the link for more info