Transfer learning from simulations improves the classification of OCT images of glandular epithelia

Published: Oct. 26, 2020, 9:02 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.26.355180v1?rss=1 Authors: Ostvar, S., Troung, H., Silver, E. R., Lightdale, C. J., Hur, C., Tatonetti, N. P. Abstract: Esophageal adenocarcinoma (EAC) is a rare but lethal cancer with rising incidence in several global hotspots including the United States. The five-year survival rate for patients diagnosed with advanced disease can be as low as 5% in EAC, making early detection and preventive intervention crucial. The current standard of care for EAC targets patients with Barrett's esophagus (BE), the main precursor to EAC and a relatively common condition in adults with chronic acid reflux disease. Preventive care for EAC requires repeated surveillance endoscopies of BE patients with biopsy sampling, and can be intrusive, error-prone, and costly. The integration of minimally-invasive subsurface tissue imaging in the current standard of care can reduce the need for exhaustive tissue sampling and improve the quality of life in BE patients. Effective adoption of subsurface imaging in EAC care can be facilitated by computer-aided detection (CAD) systems based on deep learning. Despite their recent successes in lung and breast cancer imaging, the development of deep neural networks for rare conditions like EAC remains challenging due to data scarcity, heavy bias in existing datasets toward non-cases, and uncertainty in image labels. Here we explore the use of synthetic datasets--specifically data derived from simulations of optical back-scattering during imaging-- in the development of CAD systems based on deep learning. As a proof of concept, we studied the binary classification of esophageal OCT into normal squamous and glandular mucosae, typical of BE. We found that deep convolutional networks trained on synthetic data had improved performance over models trained on clinical datasets with uncertain labels. Model performance also improved with dataset size during training on synthetic data. Our findings demonstrate the utility of transfer from simulations to real data in the context of medical imaging, especially in the severely data-poor regime and when significant uncertainty in labels are present, and motivate further development of transfer learning from simulations to aid the development of CAD for rare malignancies. Copy rights belong to original authors. Visit the link for more info