Dimensionality Reduction Scheme Research Articles

AbstractFood computing has emerged as a promising research field, employing artificial intelligence, deep learning, and data science methodologies to enhance various stages of food production pipelines. To this end, the food computing community has compiled a variety of data sets and developed various deep-learning architectures to perform automatic classification. However, automated food classification presents a significant challenge, particularly when it comes to local and regional cuisines, which are often underrepresented in available public-domain data sets. Nevertheless, obtaining high-quality, well-labeled, and well-balanced real-world labeled images is challenging since manual data curation requires significant human effort and is time-consuming. In contrast, the web has a potentially unlimited source of food data but tapping into this resource has a good chance of corrupted and wrongly labeled images. In addition, the uneven distribution among food categories may lead to data imbalance problems. All these issues make it challenging to create clean data sets for food from web data. To address this issue, we present AutoCleanDeepFood, a novel end-to-end food computing framework for regional gastronomy that contains the following components: (i) a fully automated pre-processing pipeline for custom data sets creation related to specific regional gastronomy, (ii) a transfer learning-based training paradigm to filter out noisy labels through loss ranking, incorporating a Russian Roulette probabilistic approach to mitigate data imbalance problems, and (iii) a method for deploying the resulting model on smartphones for real-time inferences. We assess the performance of our framework on a real-world noisy public domain data set, ETH Food-101, and two novel web-collected datasets, MENA-150 and Pizza-Styles. We demonstrate the filtering capabilities of our proposed method through embedding visualization of the feature space using the t-SNE dimension reduction scheme. Our filtering scheme is efficient and effectively improves accuracy in all cases, boosting performance by 0.96, 0.71, and 1.29% on MENA-150, ETH Food-101, and Pizza-Styles, respectively.

Read full abstract

BackgroundMachine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios.MethodsWe compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning.ResultsDepending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively.ConclusionsBy illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.

Read full abstract

Dimensionality Reduction Scheme Research Articles

Related Topics

Articles published on Dimensionality Reduction Scheme

A leader-adaptive particle swarm optimization with dimensionality reduction strategy for feature selection

Black-box adversarial patch attacks using differential evolution against aerial imagery object detectors

Classification of interacting Dirac semimetals

Autocleandeepfood: auto-cleaning and data balancing transfer learning for regional gastronomy food computing

Hyperspectral remote sensing of forage stoichiometric ratios in the senescent stage of alpine grasslands

A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction.

Hybrid control strategy for efficiency enhancement of a raft-type wave energy converter

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Fully unsupervised wear anomaly assessment of aero-bearings enhanced by multi-representation learning of deep features

A Bayesian Structural Modal Updating Method Based on Sparse Grid and Ensemble Kalman Filter

Dimensional reduction technique-based maximum entropy principle method for safety degree analysis under twofold random uncertainty

Soft computing methods in the solution of an inverse heat transfer problem with phase change: A comparative study

Mining the High Dimensional Biological Dataset Using Optimized Colossal Pattern with Dimensionality Reduction

A novel approach for detecting sensor-based semiconductor fault yield classification using convolutional neural networks

Predicting soil erosion and sediment delivery in large, data‐sparse, mountainous basins

HCDR-based motion planning for mobile manipulators with the continuous trajectory task

Wind energy assessment considering a truncated distribution of probabilistic turbulence power spectral parameters

Hybrid Causal Feature Selection for Cancer Biomarker Identification from RNA-seq Data.

Fast Bayesian Linearized Inversion With an Efficient Dimension Reduction Strategy

Evaluating dimensionality reduction strategies on mixed-type datasets: a comparative analysis using Python, R and SPSS

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dimensionality Reduction Scheme Research Articles

Related Topics

Articles published on Dimensionality Reduction Scheme

A leader-adaptive particle swarm optimization with dimensionality reduction strategy for feature selection

Black-box adversarial patch attacks using differential evolution against aerial imagery object detectors

Classification of interacting Dirac semimetals

Autocleandeepfood: auto-cleaning and data balancing transfer learning for regional gastronomy food computing

Hyperspectral remote sensing of forage stoichiometric ratios in the senescent stage of alpine grasslands

A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction.

Hybrid control strategy for efficiency enhancement of a raft-type wave energy converter

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Fully unsupervised wear anomaly assessment of aero-bearings enhanced by multi-representation learning of deep features

A Bayesian Structural Modal Updating Method Based on Sparse Grid and Ensemble Kalman Filter

Dimensional reduction technique-based maximum entropy principle method for safety degree analysis under twofold random uncertainty

Soft computing methods in the solution of an inverse heat transfer problem with phase change: A comparative study

Mining the High Dimensional Biological Dataset Using Optimized Colossal Pattern with Dimensionality Reduction

A novel approach for detecting sensor-based semiconductor fault yield classification using convolutional neural networks

Predicting soil erosion and sediment delivery in large, data‐sparse, mountainous basins

HCDR-based motion planning for mobile manipulators with the continuous trajectory task

Wind energy assessment considering a truncated distribution of probabilistic turbulence power spectral parameters

Hybrid Causal Feature Selection for Cancer Biomarker Identification from RNA-seq Data.

Fast Bayesian Linearized Inversion With an Efficient Dimension Reduction Strategy

Evaluating dimensionality reduction strategies on mixed-type datasets: a comparative analysis using Python, R and SPSS