A software framework for data dimensionality reduction: application to chemical crystallography

Sai Samudrala,Jaroslaw Zola,Baskar Ganapathysubramanian,Krishna Rajan,Prasanna Balachandran

doi:10.1186/preaccept-5129428151148844

Abstract

Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.

Highlights

Using data mining techniques to probe and establish process-structure-property relationships has witnessed a growing interest owing to its ability to accelerate the process of tailoring materials by design
Apatites are conveniently described by the general formula AI4AI6I(BO4)6X2, where AI and AII are distinct crystallographic sites that usually accommodate larger monovalent (Na+, Li+, etc.), divalent (Ca2+, Sr2+, Ba2+, Pb2+, etc.), and trivalent (Y3+, Ce3+, La3+, etc.), B-site is occupied by smaller tetrahedrally coordinated cations (Si4+, P5+, V5+, Cr5+, etc.), and the X-site is occupied by halides (F−, Cl−, Br−), oxides, and hydroxides
In this paper, we have detailed a mathematical framework of various data dimensionality reduction techniques for constructing reduced order models of complicated datasets and discussed the key questions involved in data selection

Summary

Background

Using data mining techniques to probe and establish process-structure-property relationships has witnessed a growing interest owing to its ability to accelerate the process of tailoring materials by design. Dimensionality reduction techniques like PCA or factor analysis to establish process-structure-property relationships traditionally assume a linear relationship among the variables. This is often not strictly valid; the data usually lies on a non-linear manifold (or surface) [13,19]. Non-linear dimensionality reduction (NLDR) techniques can be applied to unravel the non-linear structure from unordered data An example of such application for constructing a low-dimensional stochastic representation of property variations in random heterogenous media is [19]. The key mathematical idea underpinning DR can be explained as follows: We encode the desired information about X, i.e., topology or distance, in its entirety by considering all pairs of points in X This encoding is represented as a matrix An×n.

Construct the low-dimensional representation in Rd from the eigenpairs:

Organized settings tabs

Results and discussion

Normalized UnNormalized

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A software framework for data dimensionality reduction: application to chemical crystallography

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Integrating Materials and Manufacturing Innovation

Lead the way for us

Journal: Integrating Materials and Manufacturing Innovation	Publication Date: Jan 1, 2014
License type: cc-by

Similar Papers

A software framework for data dimensionality reduction: application to chemical crystallography
Sai Kiranmayee Samudrala ... Baskar Ganapathysubramanian
Integrating Materials and Manufacturing Innovation | VOL. 3
Sai Kiranmayee Samudrala, et. al.Sai Kiranmayee Samudrala ... Baskar Ganapathysubramanian
29 Jun 2014
Integrating Materials and Manufacturing Innovation | VOL. 3

Chapter 6 - Data Dimensionality Reduction in Materials Science
S Samudrala ... B Ganapathysubramanian
Informatics for Materials Science and Engineering | VOL. -
S Samudrala, et. al.S Samudrala ... B Ganapathysubramanian
01 Jan 2013
Informatics for Materials Science and Engineering | VOL. -

User-guided Dimensionality Reduction Ensembles
Gladys M Hilasaca ... Fernando V Paulovich
-
Gladys M Hilasaca, et. al.Gladys M Hilasaca ... Fernando V Paulovich
01 Jul 2019
01 Jul 2019

Dimensionality Reduction Using Similarity-Induced Embeddings.
Nikolaos Passalis ... Anastasios Tefas
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Nikolaos Passalis, et. al.Nikolaos Passalis ... Anastasios Tefas
08 Aug 2017
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A software framework for data dimensionality reduction: application to chemical crystallography

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Integrating Materials and Manufacturing Innovation