Chemistry Informer Libraries: Conception, Early Experience, and Role in the Future of Cheminformatics.

Spencer D. Dreher,Shane W. Krska

doi:10.1021/acs.accounts.0c00760

Abstract

The synthetic chemistry literature traditionally reports the scope of new methods using simple, nonstandardized test molecules that have uncertain relevance in applied synthesis. In addition, published examples heavily favor positive reaction outcomes, and failure is rarely documented. In this environment, synthetic practitioners have inadequate information to know whether any given method is suitable for the task at hand. Moreover, the incomplete nature of published data makes it poorly suited for the creation of predictive reactivity models via machine learning approaches. In 2016, we reported the concept of chemistry informer libraries as standardized sets of medium- to high-complexity substrates with relevance to pharmaceutical synthesis as demonstrated using a multidimensional principle component analysis (PCA) comparison to the physicochemical properties of marketed drugs. We showed how informer libraries could be used to evaluate leading synthetic methods with the complete capture of success and failure and how this knowledge could lead to improved reaction conditions with a broader scope with respect to relevant applications. In this Account, we describe the progress made and lessons learned in subsequent studies using informer libraries to profile eight additional reaction classes. Examining broad trends across multiple types of bond disconnections against a standardized chemistry "measuring stick" has enabled comparisons of the relative potential of different methods for applications in complex synthesis and has identified opportunities for further development. Furthermore, the powerful combination of informer libraries and 1536-well-plate nanoscale reaction screening has allowed the parallel evaluation of scores of synthetic methods in the same experiment and as such illuminated an important role for informers as part of a larger data generation workflow for predictive reactivity modeling. Using informer libraries as problem-dense, strong filters has allowed broad sets of reaction conditions to be narrowed down to those that display the highest tolerance to complex substrates. These best conditions can then be used to survey broad swaths of substrate space using nanoscale chemistry approaches. Our experiences and those of our collaborators from several academic laboratories applying informer libraries in these contexts have helped us identify several areas for potential improvements to the approach that would increase their ease of use, utility in generating interpretable results, and resulting uptake by the broader community. As we continue to evolve the informer library concept, we believe it will play an ever-increasing role in the future of the democratization of high-throughput experimentation and data science-driven synthetic method development.

Full Text