In this study, we employed machine learning techniques to improve sustainable materials design by examining how various latent space representations affect the AI performance in property predictions. We compared three fingerprinting methodologies: (a) neural networks trained on specific properties, (b) encoder–decoder architectures, and c) traditional Morgan fingerprints. Their encoding quality was quantitatively compared by using these fingerprints as inputs for a simple regression model (Random Forest) to predict glass transition temperatures (Tg), a critical parameter in determining material performance. We found that the task-specific neural networks achieved the highest accuracy, with a mean absolute percentage error (MAPE) of 10% and an R2 of 0.9, significantly outperforming encoder–decoder models (MAPE: 19%, R2: 0.76) and Morgan fingerprints (MAPE: 24%, R2: 0.6). In addition, we used dimensionality reduction techniques, such as principal component analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE), to gain insights on the models’ abilities to learn relevant molecular features to Tg. By offering a more profound understanding of how chemical structures influence AI-based property predictions, this approach enables the efficient identification of high-performing materials in applications that range from water decontamination to polymer recyclability with minimum experimental effort, promoting a circular economy in materials science.
Read full abstract