Single-cell identity definition using random forests and recursive feature elimination (scRFE)

Published: Aug. 4, 2020, 6 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.03.233650v1?rss=1 Authors: Park, M., Vorperian, S., Wang, S., Pisco, A. O. Abstract: Single cell RNA sequencing (scRNA-seq) enables detailed examination of a cell's underlying regulatory networks and the molecular factors contributing to its identity. We developed scRFE (single-cell identity definition using random forests and recursive feature elimination, pronounced 'surf') with the goal of easily generating interpretable gene lists that can accurately distinguish observations (single-cells) by their features(genes) given a class of interest. scRFE is an algorithm implemented as a Python package that combines the classical random forest method with recursive feature elimination and cross validation to find the features necessary and sufficient to classify cells in a single-cell RNA-seq dataset by ranking feature importance. The package is compatible with Scanpy, enabling a seamless integration into any single-cell data analysis workflow that aims at identifying minimal transcriptional programs relevant to describing metadata features of the dataset. We applied scRFE to the Tabula Muris Senis and reproduced commonly known aging patterns and transcription factor reprogramming protocols, highlighting the biological value of scRFE's learned features. Copy rights belong to original authors. Visit the link for more info