ConspectusDesigning new materials is vital for addressing pressing societal challenges in health, energy, and sustainability. The combination of physicochemical laws and empirical trial and error has long guided material design, but this approach is limited by the cost of experiments and the difficulty of deriving complex guiding principles. The space of hypothetical materials to be considered is incredibly large, and only a small fraction of possible compounds can ever be tested experimentally. The computational techniques of atomistic simulation and machine learning (ML) offer an avenue to rapidly invent new materials and navigate this enormous space. Together, they can be used to infer complex design principles and identify high-quality candidates more rapidly than trial-and-error experimentation. In this Account, we review our group’s recent contributions to simulation and ML for materials design. We begin by discussing the numerical representation of materials for use in ML. Representations can be produced through deterministic algorithms, learnable encodings, or physics-based methods and lead to vector, graph, and matrix outputs. We describe how these different approaches offer distinct material- and application-specific advantages. We provide demonstrations from our own work on small-molecule drugs, macromolecules, dyes, electrolytes, and zeolites. In several cases, we show how the appropriate representation led to guiding principles that facilitated experimental materials design. Next, we highlight the development of ML methods for enhancing atomistic simulation. These advances help to improve simulation accuracy and expand the time and length scales that can be explored. They include differentiable atomistic simulations in which ensemble-averaged quantities are differentiated with respect to system parameters, and novel autoregressive methods for enhanced sampling of challenging physical distributions. Other developments include learnable coarse-grained models, which can accelerate molecular dynamics while minimizing the loss of all-atom information, and ML interatomic potentials, which can be trained on maximally informative quantum chemistry data through active learning and adversarial uncertainty attacks. Next, we show how these combined computational advances have enabled high-throughput virtual screening. This has led to the discovery of low-cost organic structure-directing agents for zeolite synthesis, polymer electrolytes, and efficient photoswitches for targeted medicine. We conclude by discussing the limitations of ML and simulation. These include the large data requirements and limited chemical transferability of the former and the speed–accuracy trade-offs of the latter. We predict that advancements in quantum chemistry will further accelerate simulations, while the incorporation of physical principles will improve the reliability of ML.
Read full abstract