RSGSA: a Robust and Stable Gene Selection Algorithm

Published: July 28, 2020, 10:11 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.27.216879v1?rss=1 Authors: Saha, S., Soliman, A., Rajasekaran, S. Abstract: Nowadays we are observing an explosion of gene expression data with phenotypes. It enables researchers to efficiently identify genes responsible for certain medical condition as well as classify them for drug target. Like any other phenotype data in medical domain, gene expression data with phenotypes also suffers from being very underdetermined system. In a very large set of features but a very small sample size domains (e.g., DNA microarray, RNA-seq data, GWAS data, etc.), it is often reported that several different spurious feature subsets may yield equally optimal results. This phenomenon is known as "instability". Considering these facts, we have developed a very robust and stable supervised gene selection algorithm to select the most discriminating non-spurious set of genes from the gene expression datasets with phenotypes. "Stability" and "robustness" is ensured by class and instance levels perturbations, respectively. We have performed rigorous experimental evaluations using 10 real gene expression microarray datasets with phenotypes. It revealed that our algorithm outperforms the state-of-the-art algorithms with respect to stability and classification accuracy. We have also done biological enrichment analysis based on gene ontology-biological processes (GO-BP) terms, disease ontology (DO) terms, and biological pathways. Copy rights belong to original authors. Visit the link for more info