Graph of graphs analysis for multiplexed data with application to imaging mass cytometry

Published: Aug. 24, 2020, 7:02 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.23.263467v1?rss=1 Authors: Lin, Y.-W. E., Shnitzer, T., Talmon, R., Villaroel-Espindola, F., Desai, S., Schalper, K., Kluger, Y. Abstract: Hyper spectral imaging, sensor networks, spatial multiplexed proteomics, and spatial transcriptomics assays is a representative subset of distinct technologies from diverse domains of science and engineering that share common data structures. The data in all these modalities consist of high-dimensional multivariate observations (m-dimensional feature space) collected at different spatial positions and therefore can be analyzed using similar computational methodologies. Furthermore, in many studies practitioners collect datasets consisting of multiple spatial assays of this type, each capturing such data from a single biological sample, patient, or hyper spectral image, etc. Each of these spatial assays could be characterized by several regions of interest (ROIs). The focus of this paper is on a particular application, imaging mass cytometry (IMC), which falls into this problem setup. To extract meaningful information from the multi-dimensional observations recorded at different ROIs across different assays, we propose to analyze such datasets using a two-step graph-based approach. We first construct for each ROI a graph representing the interactions between the m covariates and compute an m dimensional vector characterizing the steady state distribution among features. We then use all these mdimensional vectors to construct a graph between the ROIs from all assays. This second graph is subjected to a nonlinear dimension reduction analysis, retrieving the intrinsic geometric representation of the ROIs. Such a representation provides the foundation for efficient and accurate organization of the different ROIs that correlates with their phenotypes. Theoretically, we show that when the ROIs have a particular bi-modal distribution, the new representation gives rise to a better distinction between the two modalities compared to the maximum a posteriori (MAP) estimator. We applied our method to predict the sensitivity to PD-1 axis blockers treatment of lung cancer subjects based on IMC data, achieving 92% accuracy. This serves as empirical evidence that the graph of graphs approach enables us to integrate multiple ROIs and the intrarelationships between the features at each ROI, giving rise to an informative representation that is strongly associated with the phenotypic state of the entire image. Importantly, this approach is applicable to other modalities such as spatial transcriptomics. Copy rights belong to original authors. Visit the link for more info