SiGMoiD: A superstatistical generative model for binary data

Published: Oct. 15, 2020, 2:02 a.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.14.338277v1?rss=1 Authors: Dixit, P. D. Abstract: In modern biological physics, there is a great interest in building generative probabilistic models for ensembles of covarying binary variables. A popular approach is to use the maximum entropy principle. Here, one builds generative models that use as constraints lower level statistics estimated from the data. While extremely popular, maximum entropy models have conceptual as well as practical issues; they rely on the modelers' choice of constraints and are computationally expensive to infer when the number of variables is large (n > 100). Here, we address both these issues with Superstatistical Generative Model for Binary Data (SiGMoiD). SiGMoiD is a maximum entropy based framework where we imagine that the data as arising from superstatistical system; individual binary variables are coupled to the same bath whose intensive variables fluctuate from sample to sample. Moreover, instead of choosing the constraints, in SiGMoiD we choose only the number of constraints and let the algorithm infer them from the data itself. Notably, we show that SiGMoiD is orders of magnitude faster than current maximum entropy-based models and allows us to model collections of very large number of binary variables. We also discuss future directions. Copy rights belong to original authors. Visit the link for more info