Abstract

Fingerprint distances, which measure the similarity of atomic environments, are commonly calculated from atomic environment fingerprint vectors. In this work, we present the simplex method that can perform the inverse operation, i.e., calculating fingerprint vectors from fingerprint distances. The fingerprint vectors found in this way point to the corners of a simplex. For a large dataset of fingerprints, we can find a particular largest simplex, whose dimension gives the effective dimension of the fingerprint vector space. We show that the corners of this simplex correspond to landmark environments that can be used in a fully automatic way to analyze structures. In this way, we can, for instance, detect atoms in grain boundaries or on edges of carbon flakes without any human input about the expected environment. By projecting fingerprints on the largest simplex, we can also obtain fingerprint vectors that are considerably shorter than the original ones but whose information content is not significantly reduced.

Highlights

  • Materials science has become, to a large extent, a data driven science

  • The atomic environments for all of the carbon atoms are equivalent. This is not true anymore if the fullerene has a so-called Stone–Wales defect.40. We look at such a structure as well as a 60 atom graphite flake and categorize the atoms according to their fingerprint distance to the landmark environments, i.e., the corners of the largest simplex (LS)

  • We have introduced an algorithm to construct a largest simplex in the space spanned by a large set of atomic environment fingerprint vectors

Read more

Summary

INTRODUCTION

To a large extent, a data driven science. Several data banks exist that contain structural data and calculated properties; many exceed the hundreds of thousands of structural properties in number, with their number growing dramatically. Molecular dynamics (MD) simulations typically generate very large datasets. Atomic environment fingerprints are used as inputs for supervised machine learning schemes of potential energy surfaces. For such a use, it is desirable that the fingerprint is able to detect any difference in the environment while keeping the fingerprint vector as short as possible. SOAP5 fingerprints coupled to machine learning methods were recently used to predict properties of grain boundaries.. Several other methods exist in the computational physics and machine learning communities for the selection of fingerprint components and atomic environments. We introduce a method that selects all the relevant structures fully automatically based on a large pool of structures. The method is applicable without any adjustments to any molecular system whose atomic environments can be represented by fingerprints

Fingerprints and fingerprint distances
Obtaining fingerprint vectors from fingerprint distances
Construction of the largest simplex
APPLICATIONS
C60 clusters
Grain boundary networks in nanocrystalline Al
The compression of the fingerprints
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.