Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

Rumen Dangovski,Thomas Christensen,Charlotte Loh,Marin Soljačić,Samuel Kim

doi:10.1038/s41467-022-31915-y

Rumen Dangovski, Thomas Christensen + Show 3 more

Open Access

https://doi.org/10.1038/s41467-022-31915-y

Copy DOI

Journal: Nature Communications	Publication Date: Jul 21, 2022
Citations: 4	License type: open-access

Affiliation: Massachusetts Institute of Technology

Abstract

Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrödinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

Full Text