KMD clustering: Robust generic clustering of biological data

Published: Oct. 4, 2020, 6:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.04.325233v1?rss=1 Authors: Zelig, A., Kaplan, N. Abstract: The challenges of clustering noisy high-dimensional biological data have spawned advanced clustering algorithms that are tailored for specific subtypes of biological datatypes. However, the performance of such methods varies greatly between datasets, they require post hoc tuning of cryptic hyperparameters, and they are often not transferable to other types of data. Here we present a novel generic clustering approach called k minimal distances (KMD) clustering, based on a simple generalization of single and average linkage hierarchical clustering. We show how a generalized silhouette-like function is predictive of clustering accuracy and exploit this property to eliminate the main hyperparameter k. We evaluated KMD clustering on standard simulated datasets, simulated datasets with high noise added, mass cytometry datasets and scRNA-seq datasets. When compared to standard generic and state-of-the-art specialized algorithms, KMD clustering's performance was consistently better or comparable to that of the best algorithm on each of the tested datasets. Copy rights belong to original authors. Visit the link for more info