High-dimensional Feature Space Research Articles

Abstract Kernel principal component analysis (kernel PCA) is a nonlinear dimensionality reduction technique that employs kernel functions to map data into a high-dimensional feature space, thereby extending the applicability of linear PCA to nonlinear data and facilitating the extraction of informative principal components. However, kernel PCA necessitates the manipulation of large-scale matrices, leading to high computational complexity and posing challenges for efficient implementation in big data environments. Quantum computing has recently been integrated with kernel methods in machine learning, enabling effective analysis of input data within intractable feature spaces. Although existing quantum kernel PCA proposals promise exponential speedups, they impose stringent requirements on quantum hardware that are challenging to fulfill. In this work, we propose a quantum algorithm for kernel PCA by establishing a connection between quantum kernel methods and block encoding, thereby diagonalizing the centralized kernel matrix on a quantum computer. The query complexity is logarithmic with respect to the size of the data vector, $D$, and linear with respect to the size of the dataset. An exponential speedup could be achieved when the dataset consists of a few high-dimensional vectors, wherein the dataset size is polynomial in $\log(D)$, with $D$ being significantly large. In contrast to existing work, our algorithm enhances the efficiency of quantum kernel PCA and reduces the requirements for quantum hardware. Furthermore, we have also demonstrated that the algorithm based on block encoding matches the lower bound of query complexity, indicating that our algorithm is nearly optimal. Our research has laid down new pathways for developing quantum machine learning algorithms aimed at addressing tangible real-world problems and demonstrating quantum advantages within machine learning.

Read full abstract

In the quest for advanced superconducting materials, the accurate prediction of critical temperatures (Tc) poses a formidable challenge, largely due to the complex interdependencies between superconducting properties and the chemical and structural characteristics of a given material. To address this challenges, we have developed a machine-learning framework that aims to elucidate these complicated and hitherto poorly understood structure-property and property-property relationships. This study introduces a novel machine-learning-based workflow, termed the Gradient Boosted Feature Selection (GBFS), which has been tailored to predict Tc for superconductors by employing a distributed gradient-boosting framework. This approach integrates exploratory data analyses, statistical evaluations, and multicollinearity reduction techniques to select highly relevant features from a high-dimensional feature space, derived solely from the chemical composition of materials. Our methodology was rigorously tested on a data set comprising approximately 16,400 chemical compounds with around 12,000 unique chemical compositions. The GBFS workflow enabled the development of a classification model that distinguishes compositions likely to exhibit Tc values greater than 10 K. This model achieved a weighted average F1-score of 0.912, an AUC-ROC of 0.986, and an average precision score of 0.919. Additionally, the GBFS workflow underpinned a regression model that predicted Tc values with an R2 of 0.945, an MAE of 3.54 K, and an RMSE of 6.57 K on a test set obtained via random splitting. Further exploration was conducted through out-of-sample Tc predictions, particularly those exceeding the liquid nitrogen temperature, and out-of-distribution predictions for (Ca1-xLax)FeAs2 based on varying lanthanum content. The outcome of our study underscores the significance of systematic feature analysis and selection in enhancing predictive model performance, offering various advantages over models that rely primarily on algorithmic complexity. This research not only advances the field of superconductivity but also sets a precedent for the application of machine learning in materials science.

Read full abstract

High-dimensional Feature Space Research Articles

Related Topics

Articles published on High-dimensional Feature Space

Near-optimal quantum kernel principal component analysis

Kernel machine tests of association using extrinsic and intrinsic cluster evaluation metrics

Performance of uncertainty-based active learning for efficient approximation of black-box functions in materials science.

Elliptic geometry-based kernel matrix for improved biological sequence classification

Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection.

The Application of a Random Forest Classifier to ToF-SIMS Imaging Data.

Neuronal parts list and wiring diagram for a visual system

Low-Light Image Enhancement via Dual Information-Based Networks

Specific Emitter Identification Algorithm Based on Time–Frequency Sequence Multimodal Feature Fusion Network

Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors.

Model-free detection and quantitative assessment of micro short circuits in lithium-ion battery packs based on incremental capacity and unsupervised clustering

ECF-Net: Enhanced, Channel-Based, Multi-Scale Feature Fusion Network for COVID-19 Image Segmentation

HEnsem_DTIs: A heterogeneous ensemble learning model for drug-target interactions prediction

Distributed monitoring of nonlinear plant-wide processes based on GA-regularized kernel canonical correlation analysis

An improved equilibrium optimization algorithm for feature selection problem in network intrusion detection

A portable affective computing system for identifying mate preference

AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: A CASE STUDY FOR KURDISH TEXT

Hybrid genetic optimization for quantum feature map design

An optimization framework with dimensionality reduction using Markov Chain Monte Carlo and genetic algorithms for groundwater potential assessment

SVR-based method for fixed effects interval-valued panel models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-dimensional Feature Space Research Articles

Related Topics

Articles published on High-dimensional Feature Space

Near-optimal quantum kernel principal component analysis

Kernel machine tests of association using extrinsic and intrinsic cluster evaluation metrics

Performance of uncertainty-based active learning for efficient approximation of black-box functions in materials science.

Elliptic geometry-based kernel matrix for improved biological sequence classification

Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection.

The Application of a Random Forest Classifier to ToF-SIMS Imaging Data.

Neuronal parts list and wiring diagram for a visual system

Low-Light Image Enhancement via Dual Information-Based Networks

Specific Emitter Identification Algorithm Based on Time–Frequency Sequence Multimodal Feature Fusion Network

Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors.

Model-free detection and quantitative assessment of micro short circuits in lithium-ion battery packs based on incremental capacity and unsupervised clustering

ECF-Net: Enhanced, Channel-Based, Multi-Scale Feature Fusion Network for COVID-19 Image Segmentation

HEnsem_DTIs: A heterogeneous ensemble learning model for drug-target interactions prediction

Distributed monitoring of nonlinear plant-wide processes based on GA-regularized kernel canonical correlation analysis

An improved equilibrium optimization algorithm for feature selection problem in network intrusion detection

A portable affective computing system for identifying mate preference

AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: A CASE STUDY FOR KURDISH TEXT

Hybrid genetic optimization for quantum feature map design

An optimization framework with dimensionality reduction using Markov Chain Monte Carlo and genetic algorithms for groundwater potential assessment

SVR-based method for fixed effects interval-valued panel models