Recent shift towards clean energy increased the demand for both fuel cells (used as clean power generator) and water electrolyzer (used as hydrogen supply) significantly. Anion exchange membrane (AEM) serves as a core component for these devices, though its low anion conductivity and durability inhibit their potential for commercialization. Many research and development (R&D) have been done seeking for improvements in AEM1, but current empirical-centric method consumes significant amount of resources, such as cost, labor, and time. To reduce resource consumptions, implementing materials informatics (MI) that allows high-speed screening of materials through a pre-trained AEM polymer machine learning (ML) model is important. However, AEMs are made up of polymers whose chemical structures are complex and hard to represent in a machine-understandable form. Fingerprints are often used to represent chemical structures in numerical forms generated through algorithm2. Majority of these fingerprints are designed for small molecules, not polymers, but they are usually unintuitive and difficult to understand due to their topological nature. In contrast, nuclear magnetic resonance (NMR) chemical shift have long been used in chemistry to identify the chemical structure of a particular sample3. In this study, we aim to utilize the high-resolution nature of NMR chemical shift to identify structural formula as chemical structure fingerprint for ML model, such that a polymer-suited and highly explainable fingerprint can be developed.First, an AEM database containing structural and experimental condition information was built using data extracted from 62 papers. Experimental conditions included were anion conductivity measuring temperature, alkaline stability test measuring condition (test temperature, length of test, and concentration of alkaline solution). Then, the 13C NMR chemical shift for the chemical structure contained in structural information was calculated using ChemDraw. The obtained chemical shifts were converted to numerical strings and is named as “NMR fingerprint”. A new AEM database containing both structural (molar ratio of each building blocks and NMR fingerprints) and experimental condition was used as the training database for ML models. Target variable was set to anion conductivity, and the rest were explanatory variables. ML model used was XGBoost. Cross-validation was used to evaluate the capability of ML models to predict anion conductivity of novel AEM polymers. Prediction logic was analyzed using Shapley additive explanations (SHAP) value.The database built contains data from 62 AEM papers, with 2,197 anion conductivity data points. Each AEM chemical structures present in the database was converted to NMR fingerprints using NMR chemical shifts, obtaining around 2,000 NMR fingerprints for each AEM polymer unit. Together with the experimental conditions and structural information included in the database, the data were used as train-validation dataset for XGBoost. The coefficient of determination (R2) obtained for cross-validated model was 0.9235, implying that the model learnt and determined the relationship between anion conductivity and AEM polymer structure with high accuracy, with the aid of experimental conditions. Then, the prediction logic of the ML model was explored using SHAP values, which are values computed from coalitional game theory, and is used to increase transparency and interpretability of ML models. Analyzing the plot of SHAP values for top 20 important variables used in XGBoost showed that measuring temperature for anion conductivity ranked highest, which is in coherence to the well-known behavior of AEM polymers. Besides, non-experimental condition variables such as 29.8_A ranked into the top 3 important variables. 29.8_A is the chemical shift for alkyl groups attached in between two imidazolium group of AEM polymer, suggesting that the presence or absence of more than one imidazolium group per side chain is important to determine the anion conductivity of an AEM polymer. SHAP values for 29.8_A show that higher feature value (pink color) gives higher impact (positive region of x-axis) to the target variable, inferring that having alkyl groups between imidazolium groups give beneficial effect to anion conductivity. Such ability to explain the prediction logic of ML model shows that using NMR chemical shifts as fingerprints for AEM polymer structures provide intuitive, human-understandable ML prediction logic explanation. Together with the high cross-validation accuracy, NMR chemical shifts hold the potential to not only be a gold standard in expressing polymer structures in machine-understandable form, but also to strongly push the adoption of ML in the AEM polymer field, creating a paradigm shift for AEM R&D.Reference S. Gottesfeld et al., J. Power Sources 2018, 375, 170-184.J. Bajorath, J. Chem. Inf. Comput. Sci. 2001, 41, 2, 233–245.P. Jezzard et al., Adv. Mater. 1992, 4, 2, 82-90. Figure 1
Read full abstract