Small Sample Size Datasets Research Articles

Alzheimer's disease and related dementias (ADRD) present a looming public health crisis, affecting roughly 5 million people and 11 % of older adults in the United States. Despite nationwide efforts for timely diagnosis of patients with ADRD, >50 % of them are not diagnosed and unaware of their disease. To address this challenge, we developed ADscreen, an innovative speech-processing based ADRD screening algorithm for the protective identification of patients with ADRD. ADscreen consists of five major components: (i) noise reduction for reducing background noises from the audio-recorded patient speech, (ii) modeling the patient's ability in phonetic motor planning using acoustic parameters of the patient's voice, (iii) modeling the patient's ability in semantic and syntactic levels of language organization using linguistic parameters of the patient speech, (iv) extracting vocal and semantic psycholinguistic cues from the patient speech, and (v) building and evaluating the screening algorithm. To identify important speech parameters (features) associated with ADRD, we used the Joint Mutual Information Maximization (JMIM), an effective feature selection method for high dimensional, small sample size datasets. Modeling the relationship between speech parameters and the outcome variable (presence/absence of ADRD) was conducted using three different machine learning (ML) architectures with the capability of joining informative acoustic and linguistic with contextual word embedding vectors obtained from the DistilBERT (Bidirectional Encoder Representations from Transformers). We evaluated the performance of the ADscreen on an audio-recorded patients' speech (verbal description) for the Cookie-Theft picture description task, which is publicly available in the dementia databank. The joint fusion of acoustic and linguistic parameters with contextual word embedding vectors of DistilBERT achieved F1-score = 84.64 (standard deviation [std] = ±3.58) and AUC-ROC = 92.53 (std = ±3.34) for training dataset, and F1-score = 89.55 and AUC-ROC = 93.89 for the test dataset. In summary, ADscreen has a strong potential to be integrated with clinical workflow to address the need for an ADRD screening tool so that patients with cognitive impairment can receive appropriate and timely care.

Read full abstract

BackgroundThe prediction of sepsis mortality of intensive care unit (ICU) observations using machine learning (ML) methods is hypothesized to yield better or as good as performance compared to the prognostic scores. This paper aims to show that the accuracy of ML in sepsis mortality estimation can be superior and supportive knowledge to SAPS II, APACHE II, and SOFA (traditional) scores even under small sample restrictions. MethodsThe retrospective collection of data from the patients (n = 200) admitted to ICU of Acibadem Hospital, Istanbul-Turkey, between 2015 and 2020 is utilized to detect the sepsis mortality risk using eight ML methods and a generated ensemble model along with the traditional prognostic scores. The mortality as a decisive indicator is evaluated according to the explanatory variables included quantifying the traditional scores. In the calibration of the data, five different predetermined splits of the random samples are used for the training and the validation of the ML methods. The efficiency of the prediction results of ML methods and the traditional scoring methods are investigated by AUC-ROC curves and other accuracy indicators. Consecutive processes of Box-Cox and Min-Max transformations on data and parameter optimization are performed to increase the efficiency of algorithms. ResultsThe accuracy in the mortality prediction is achieved the best by the Multi-Layer Perceptron algorithm compared to SAPS II and APACHE II methods and is as good as the one with what SOFA predicts. The prediction power of the best performing ML methods for APACHE II, SAPS II, and SOFA are found to be 84.45%, 85.25% and 73.47%, respectively. The ensemble of eight ML methods is found to increase the performance around 2% in APACHE II score. ConclusionsThe outcomes of this study have clinical merits in evaluating the potential use of ML methods in predicting ICU mortality superior to traditional scores APACHE II, SAPS II, and as good as SOFA. Additionally, it explores which of the variables contributing to sepsis mortality risk should be taken as apriori information in treating the patients and requires fewer number of explanatory variables, with reliable prediction powers even for considerably small sample size data sets.

Read full abstract

Small Sample Size Datasets Research Articles

Articles published on Small Sample Size Datasets

TabDEG: Classifying differentially expressed genes from RNA-seq data based on feature extraction and deep learning framework.

Sparse L0-norm least squares support vector machine with feature selection

ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia

Research on Fire Detection in Laboratories Based on CNN and Transfer Learning

Real-time classification for Φ-OTDR vibration events in the case of small sample size datasets

Feature fusion Siamese network for breast cancer detection comparing current and prior mammograms.

Estimating Rubber Covered Conveyor Belting Cure Times Using Multiple Simultaneous Optimizations Ensemble

Reconstructing codependent cellular cross-talk in lung adenocarcinoma using REMI.

The prediction power of machine learning on estimating the sepsis mortality in the intensive care unit

Multi-task manifold learning for small sample size datasets

Deep belief networks with self-adaptive sparsity

A Bonferroni Mean Based Fuzzy K Nearest Centroid Neighbor Classifier

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data.

A Robust Least Squares Support Vector Machine Based on L∞-norm

Efficient Feature Selection via $\ell _{2,0}$ ℓ2,0-norm Constrained Sparse Regression

A Practical Solution to the Small Sample Size Bias and Uncertainty Problems of Model Selection Criteria in Two-Input Process Multiple Response Surface Methodology Datasets

A semi-supervised learning approach for model selection based on class-hypothesis testing

Bayesian nonparametric estimation of hazard rate in monotone Aalen model

Two-way analysis of high-dimensional collinear data

Integration of prior knowledge of measurement noise in kernel density classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Small Sample Size Datasets Research Articles

Articles published on Small Sample Size Datasets

TabDEG: Classifying differentially expressed genes from RNA-seq data based on feature extraction and deep learning framework.

Sparse L0-norm least squares support vector machine with feature selection

ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia

Research on Fire Detection in Laboratories Based on CNN and Transfer Learning

Real-time classification for Φ-OTDR vibration events in the case of small sample size datasets

Feature fusion Siamese network for breast cancer detection comparing current and prior mammograms.

Estimating Rubber Covered Conveyor Belting Cure Times Using Multiple Simultaneous Optimizations Ensemble

Reconstructing codependent cellular cross-talk in lung adenocarcinoma using REMI.

The prediction power of machine learning on estimating the sepsis mortality in the intensive care unit

Multi-task manifold learning for small sample size datasets

Deep belief networks with self-adaptive sparsity

A Bonferroni Mean Based Fuzzy K Nearest Centroid Neighbor Classifier

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data.

A Robust Least Squares Support Vector Machine Based on L∞-norm

Efficient Feature Selection via $\ell _{2,0}$ ℓ2,0-norm Constrained Sparse Regression

A Practical Solution to the Small Sample Size Bias and Uncertainty Problems of Model Selection Criteria in Two-Input Process Multiple Response Surface Methodology Datasets

A semi-supervised learning approach for model selection based on class-hypothesis testing

Bayesian nonparametric estimation of hazard rate in monotone Aalen model

Two-way analysis of high-dimensional collinear data

Integration of prior knowledge of measurement noise in kernel density classification