Training a neural network to learn other dimensionality reduction removes data size restrictions in bioinformatics and provides a new route to exploring data representations

Published: Sept. 3, 2020, 5:02 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.03.269555v1?rss=1 Authors: Dexter, A., Thomas, S. A., Steven, R. T., Robinson, K. N., Taylor, A. J., Elia, E., Nikula, C., Campbell, A. D., Panina, Y., Najumudeen, A. K., Murta, T., Yan, B., Grabowski, P., Hamm, G., Swales, J., Gilmore, I., Yuneva, M., Goodwin, R. J. A., Barry, S., Sansom, O. J., Takats, Z., Bunch, J. Abstract: High dimensionality omics and hyperspectral imaging datasets present difficult challenges for feature extraction and data mining due to huge numbers of features that cannot be simultaneously examined. The sample numbers and variables of these methods are constantly growing as new technologies are developed, and computational analysis needs to evolve to keep up with growing demand. Current state of the art algorithms can handle some routine datasets but struggle when datasets grow above a certain size. We present a training deep learning via neural networks on non-linear dimensionality reduction, in particular t-distributed stochastic neighbour embedding (t-SNE), to overcome prior limitations of these methods. Copy rights belong to original authors. Visit the link for more info