A Design-to-Device Pipeline for Data-Driven Materials Discovery

Jacqueline M Cole

doi:10.1021/acs.accounts.9b00470

Abstract

The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times.This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application.This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Design-to-Device Pipeline for Data-Driven Materials Discovery

Abstract

Talk to us

Similar Papers

More From: Accounts of Chemical Research

Lead the way for us

Journal: Accounts of Chemical Research	Publication Date: Feb 25, 2020
Citations: 73

Similar Papers

The materials data ecosystem: Materials data science and its role in data-driven materials discovery**Project supported by the National Key R&D Program of China (Grant No. 2016YFB0700503), the National High Technology Research and Development Program of China (Grant No. 2015AA03420), Beijing Municipal Science and Technology Project, China (Grant No. D161100002416001), the National Natural Science Foundation of China (Grant No. 51172018), and
Hai-Qing Yin ... Xue Jiang
Chinese Physics B | VOL. 27
Hai-Qing Yin, et. al.Hai-Qing Yin ... Xue Jiang
19 Oct 2018
Chinese Physics B | VOL. 27

Data Science in Healthcare: Implications for Early Career Investigators.
Sanjeev P Bhavnani ... Daniel Muñoz
Circulation: Cardiovascular Quality and Outcomes | VOL. 9
Sanjeev P Bhavnani, et. al.Sanjeev P Bhavnani ... Daniel Muñoz
01 Nov 2016
Circulation: Cardiovascular Quality and Outcomes | VOL. 9

AI3SD Video: Data-driven materials discovery for functional applications

-

20 Jan 2021
20 Jan 2021

The materials innovation ecosystem: A key enabler for the Materials Genome Initiative
David L Mcdowell ... Surya R Kalidindi
MRS Bulletin | VOL. 41
David L Mcdowell, et. al.David L Mcdowell ... Surya R Kalidindi
01 Apr 2016
MRS Bulletin | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Design-to-Device Pipeline for Data-Driven Materials Discovery

Abstract

Talk to us

Similar Papers

More From: Accounts of Chemical Research