Abstract The development of next-generation polymers necessitates optimizing several key properties simultaneously, a task that is expensive and infeasible using traditional trial-and-error experimental approaches. A promising alternative is employing a combination of machine learning and physics-based tools to rapidly screen the polymer design space and provide suggestions of new polymers that meet the critical properties required for industrial applications. In this study, we introduce a comprehensive workflow that utilizes machine learning and molecular modeling approaches to design new polymers with the focus on improving five polymer properties: (1) glass transition temperature, (2) dielectric constant, (3) refractive index, (4) stress optic coefficient, and (5) linear coefficient of thermal expansion. Using a small dataset ( < 200 unique polymers), we developed quantitative structure-property relationships (QSPRs) models to accurately predict the experimental polymer properties for both homo- and co-polymer systems. We tested several ML algorithms and identified the best models for predicting these polymer properties, achieving test set R 2 greater than 0.77 across all properties. We then explored new polymers by creating a library of over ∼10 000 homopolymers using R-group enumeration tools and applied the trained QSPR models to rapidly predict the five polymer properties. The predictions of QSPR models were used to create a multi-parameter optimization score, which helped downselect the large polymer space to ∼10 promising candidates. The properties of these selected polymer candidates were subsequently validated with classical molecular dynamics simulations and density functional theory, revealing a strong correlation with the QSPR model predictions. Finally, one of the top candidates was validated by experiments, which showed good agreement against QSPR and physics-based models. Our workflow underscores the power of combining data-driven and theoretical methods in the polymer design process given a small dataset size, offering a valuable resource for experimentalists looking to leverage computer-aided strategies in materials innovation.
Read full abstract