Abstract

Abstract Supervised machine-learning models are trained with various molecular descriptors to predict infrared (IR) emission spectra of interstellar polycyclic aromatic hydrocarbons. We demonstrate that a feature importance analysis based on the random forest algorithm can be utilized to explore the physical correlation between emission features. Astronomical correlations between IR bands are analyzed as examples of demonstration by finding the common molecular fragments responsible for different bands, which improves the current understanding of the long-observed correlations. We propose a way to quantify the band correlation by measuring the similarity of the feature importance arrays of different bands, by which a correlation map is obtained for emissions in the out-of-plane bending region. Moreover, a comparison between the predictions using different combinations of descriptors underscores the strong prediction power of the extended-connectivity molecular fingerprint, and shows that the combinations of multiple descriptors of other types in general lead to improved predictivity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call