The description and analysis of chemical bonds have been difficult following the popularization of electronic structure calculations. Although many attempts have been made from the perspective of electronic structure, the sheer volume of information in the electronic structure has left contemporary chemical bond analysis methods grappling with an inescapable "Trilemma" where the model briefness, generality, and descriptiveness (descriptive power) cannot be obtained simultaneously. To push the generality and descriptiveness to their extremes, herein a general machine learning-based framework is introduced to compact chemical bonds into a detailed residue-by-residue "genome" with matched encoding/decoding tools. The framework fuses the quantum mechanical aspects, auto feature extraction, nanostructures and/or simulations, and generative models. The encoded genomes are information-dense and decodable, where 100% generality is guaranteed. The descriptiveness of genomes appears to be broader than most known models. As a proof of concept, the realization presented in this work compacts the complete information regarding two critical chemical bonds in thiolate-protected gold nanoclusters, the S-Au and Au-Au bonds, from a Bosonic-Fermionic character perspective into 8-valued genomes. The machine learning component is trained based on 26,528 density functional theory simulated electron localization function images. With an exploration of the space span for the genome, bond polarization, hybridization, intrusion of other atoms, alignments, crystal orientation, atomic motions, and more details are observed. Furthermore, it has emerged from extensive generation tests that molecules and solids can be integrated in such a concise manner than is typically achieved with purely geometric representations. To showcase the intraclass complexity of S-Au and Au-Au bonds visually, a roadmap is plotted by summarizing and correlating the similarities of 8-value-genomes. Furthermore, genomes can be associated with realistic indices easily with a simple multilayer perception architecture as a simple calculating tool. Besides, there are 3 sets of applications, including a set of chemisorption, a set of molecular dynamical analysis, and a set of ultrafast processes, showcasing the interpretability potentials of interatomic genomes in the geometric structures, kinetic properties, and vibration characteristics of molecular systems. As the framework rose to the challenge of nanoclusters from a complicated mesoscopic family of material, the displayed generality and comprehensiveness indicate that the model may "understand" chemical bonds in a machine's way.
Read full abstract