Abstract

Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.

Highlights

  • Using data mining techniques to probe and establish process-structure-property relationships has witnessed a growing interest owing to its ability to accelerate the process of tailoring materials by design

  • Apatites are conveniently described by the general formula AI4AI6I(BO4)6X2, where AI and AII are distinct crystallographic sites that usually accommodate larger monovalent (Na+, Li+, etc.), divalent (Ca2+, Sr2+, Ba2+, Pb2+, etc.), and trivalent (Y3+, Ce3+, La3+, etc.), B-site is occupied by smaller tetrahedrally coordinated cations (Si4+, P5+, V5+, Cr5+, etc.), and the X-site is occupied by halides (F−, Cl−, Br−), oxides, and hydroxides

  • In this paper, we have detailed a mathematical framework of various data dimensionality reduction techniques for constructing reduced order models of complicated datasets and discussed the key questions involved in data selection

Read more

Summary

Background

Using data mining techniques to probe and establish process-structure-property relationships has witnessed a growing interest owing to its ability to accelerate the process of tailoring materials by design. Dimensionality reduction techniques like PCA or factor analysis to establish process-structure-property relationships traditionally assume a linear relationship among the variables. This is often not strictly valid; the data usually lies on a non-linear manifold (or surface) [13,19]. Non-linear dimensionality reduction (NLDR) techniques can be applied to unravel the non-linear structure from unordered data An example of such application for constructing a low-dimensional stochastic representation of property variations in random heterogenous media is [19]. The key mathematical idea underpinning DR can be explained as follows: We encode the desired information about X, i.e., topology or distance, in its entirety by considering all pairs of points in X This encoding is represented as a matrix An×n.

Construct the low-dimensional representation in Rd from the eigenpairs:
Organized settings tabs
Results and discussion
Normalized UnNormalized
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.