Structure Of High-dimensional Data Research Articles

This study analyzes the quality of Vocational High Schools (VHS), which have a hierarchical data structure and have more than one response variable. Data gathered for this study is from the Basic Education Data (DAPODIK) in the form of raw data variables of several variables that characterize the quality of VHS and other independent variables in South Sulawesi for four years (2018 to 2021) from the Ministry of Finance Republic of Indonesia (KEMENKEU), and Statistics Indonesia (BPS). The explanatory variable at the regency level consists of four years (2018 to 2021), a multi-year and high-dimensional data structure. Therefore, Principal Component Analysis (PCA) is used to overcome this. The modelling is done by using multivariate multilevel modelling (MVMM) on one-level and two-level structures. This study aims to model the average National Examination and Accreditation scores of Vocational High School in South Sulawesi using MVMM modelling that considers the regency/city level and identifies the factors that influence the average National Examination and Accreditation scores. The results showed that the two-level multivariate model with a random intercept as a hierarchical component was better than the one-level multilevel model based on a minor Deviance Information Criterion (DIC) value. Simultaneously, at the 5% level of significance, variables that contribute significantly to the quality of Vocational High Schools in South Sulawesi Province are produced. The variables that have a significant effect on the quality of Vocational High Schools at the school level are the ratio of the number of students/pupils per study group, the percentage of certified teachers to the number of teachers, the ratio of the number of students/pupils per number of toilets, the ratio of laboratory availability, and the ratio of the availability of supporting rooms. Meanwhile, at the regency level, it was found that the percentage of poverty and Gross Regional Domestic Product (GRDP) had a significant effect on the quality of Vocational High Schools.

Read full abstract

Visual summarization of clinical data collected on patients contained within the electronic health record (EHR) may enable precise and rapid triage at the time of patient presentation to an emergency department (ED). The triage process is critical in the appropriate allocation of resources and in anticipating eventual patient disposition, typically admission to the hospital or discharge home. EHR data are high-dimensional and complex, but offer the opportunity to discover and characterize underlying data-driven patient phenotypes. Data-driven phenotypes are intended to relieve reliance on weak labels like diagnosis codes and to aid in identifying populations of existing patients that are most similar to a specific patient. These phenotypes will enable improved, personalized therapeutic decision making and prognostication. In this work, we focus on the challenge of two-dimensional patient projections. A low dimensional embedding offers visual interpretability lost in higher dimensions. While linear dimensionality reduction techniques such as principal component analysis are often used towards this aim, they are insufficient to describe the variance of patient data. This linear reduction does not account for higher order, non-linear interactions of variables. In this work, we employ the newly-described non-linear embedding technique called uniform manifold approximation and projection (UMAP). UMAP seeks to capture both local and global structures in high-dimensional data. We then use Gaussian mixture models to identify clusters in the embedded data and use the adjusted Rand index (ARI) to establish stability in the discovery of these clusters. This technique is applied to five common clinical chief complaints from a real-world ED EHR dataset, describing the emergent properties of discovered clusters. We observe clinically-relevant cluster attributes, suggesting that visual embeddings of EHR data using non-linear dimensionality reduction is a promising approach to reveal data-driven patient phenotypes. In the five chief complaints, we find between 2 and 6 clusters, with the peak mean pairwise ARI between subsequent training iterations to range from 0.35 to 0.74.

Read full abstract

Structure Of High-dimensional Data Research Articles

Related Topics

Articles published on Structure Of High-dimensional Data

Statistical Embedding: Beyond Principal Components

Stability and machine learning applications of persistent homology using the Delaunay-Rips complex

Enhancing Feature Selection for Imbalanced Alzheimer’s Disease Brain MRI Images by Random Forest

Unsupervised Subspace Learning With Flexible Neighboring.

Frequency-domain physical constrained neural network for nonlinear system dynamic prediction

Target Parameter Estimation Algorithm Based on Real-Valued HOSVD for Bistatic FDA-MIMO Radar

Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.

KODAMA exploratory analysis in metabolic phenotyping.

Long-Range Dependence Involutional Network for Logo Detection

Cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values

Information Retrieval With Chessboard-Shaped Topology for Hyperspectral Target Detection

MULTIVARIATE MULTILEVEL MODELLING TO ASSESS FACTORS AFFECTING THE QUALITY OF VOCATIONAL HIGH SCHOOLS IN SOUTH SULAWESI PROVINCE

Bayesian learners in gradient boosting for linear mixed models.

Hyperspectral image classification using meta-heuristics and artificial neural network

Unsupervised machine learning methods and emerging applications in healthcare.

Shape-aware stochastic neighbor embedding for robust data visualisations

The Intrinsic Structure of High-Dimensional Data According to the Uniqueness of Constant Mean Curvature Hypersurfaces

Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis

Visualization of emergency department clinical data for interpretable patient phenotyping

Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Structure Of High-dimensional Data Research Articles

Related Topics

Articles published on Structure Of High-dimensional Data

Statistical Embedding: Beyond Principal Components

Stability and machine learning applications of persistent homology using the Delaunay-Rips complex

Enhancing Feature Selection for Imbalanced Alzheimer’s Disease Brain MRI Images by Random Forest

Unsupervised Subspace Learning With Flexible Neighboring.

Frequency-domain physical constrained neural network for nonlinear system dynamic prediction

Target Parameter Estimation Algorithm Based on Real-Valued HOSVD for Bistatic FDA-MIMO Radar

Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning.

KODAMA exploratory analysis in metabolic phenotyping.

Long-Range Dependence Involutional Network for Logo Detection

Cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values

Information Retrieval With Chessboard-Shaped Topology for Hyperspectral Target Detection

MULTIVARIATE MULTILEVEL MODELLING TO ASSESS FACTORS AFFECTING THE QUALITY OF VOCATIONAL HIGH SCHOOLS IN SOUTH SULAWESI PROVINCE

Bayesian learners in gradient boosting for linear mixed models.

Hyperspectral image classification using meta-heuristics and artificial neural network

Unsupervised machine learning methods and emerging applications in healthcare.

Shape-aware stochastic neighbor embedding for robust data visualisations

The Intrinsic Structure of High-Dimensional Data According to the Uniqueness of Constant Mean Curvature Hypersurfaces

Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis

Visualization of emergency department clinical data for interpretable patient phenotyping

Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding.