Feature Engineering Process Research Articles

To continue closing the gap between the predictive modeling and its real-world application, we report a new data-to-prediction pipeline that advanced the state-of-the-art predictive performance of body mass index (BMI) classifications by integrating siloed claims databases via a common data model. This study adapted the ensemble-based methodology of the baseline prediction model and focused on removing the silos in the claims databases. We applied the Super Learner machine learning algorithm (SLA) to learn a combined dataset consisting of 50% data from the Optum Date of Death database and 50% data from the IBM MarketScan Commercial Claims and Encounters (CCAE), and omitted the commonly used one-hot-encoding step and used multi-categorical variables directly in the feature engineering process. These developments were then optimized via a standard cross-validation scheme and the performance was evaluated on a holdout test set. Sociodemographic and clinical characteristics were used with (denoted as SLA1) and without (denoted as SLA2) baseline BMI values to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40kg/m2). Although the newly implemented SLA1 performed similarly to the previous model, with the area under the receiver operating characteristic curve (ROC AUC) being approximately 88% for all BMI classifications, specificity ranging from 90% to 96%, and accuracy ranging from 88% to 93%. The new SLA2 achieved consistently better performance on all metrics across all BMI classes. In particular, the new SLA2 achieved 77-79% in ROC AUC, increasing from the previously reported level (73%). Its specificity improved to the range of 76-90% from 71-86%. Its accuracy improved to the range of 77-86% from 73-80%. Its recall (i.e., sensitivity) improved to the range of 64-78% from 60-76%. This study demonstrates dramatic improvements in the prediction of BMI across classifications using integrated databases in a common data model for the generation of real-world evidence.

Background: Hypertension is a major health concern across the globe and needs to be properly diagnosed to so it can be treated and to mitigate for this critical health condition. In this context, ambulatory blood pressure monitoring is essential to provide for a proper diagnosis of hypertension, which may not be possible otherwise due to the white coat effect or masked hypertension. In this paper, the objective is to develop a model which incorporates expert’s knowledge in the feature engineering process so as to accurately predict multiple medical conditions. As a case study, we have considered multiple symptoms related to hypertension and used an ambulatory blood pressure monitoring method to continuously acquire hypertension relevant data from a patient. The goal is to train a model with a minimum set of the most effective knowledge-driven features which are useful to detect multiple symptoms simultaneously using multi-class classification techniques.Method: Artificial intelligence-based blood pressure monitoring techniques introduce a new dimension in the diagnosis of hypertension by enabling a continuous (24hours) analysis of systolic and diastolic blood pressure levels. In this work, we present a model that entails a knowledge-driven feature engineering method and implemented an ambulatory blood pressure monitoring system to diagnose multiple cardiac parameters and associated conditions simultaneously these include morning surge, circadian rhythm, and pulse pressure. The knowledge-driven features are extracted to improve the interpretability of the classification model and machine learning techniques (Random Forest, Naive Bayes, and KNN) were applied in a multi-label classification setup using RAkEL to classify multiple conditions simultaneously.Results: The results obtained (F 1 = 0.918) show that the Random forest technique has performed well for multilabel classification using knowledge-driven features. Our technique has also reduced the complexity of the model by reducing the number of features required to train a machine learning model.Conclusion: Considering these results, we conclude that knowledge-driven feature engineering enhances the learning process by reducing the number of features given as input to the machine learning algorithm. The proposed feature engineering method considers expert’s knowledge to develop better diagnosis models which are free from misleading data-driven noisy features in some situations. It is a white-box approach in which clinicians can under stand the importance of a feature while looking at its value.

Feature Engineering Process Research Articles

Articles published on Feature Engineering Process

Analisa dan Perancangan Machine Learning Untuk Mendeteksi Kegagalan Job di Apache Spark

Improved Prediction of Body Mass Index in Real-World Administrative Healthcare Claims Databases.

Optimization Simulation of Dance Technical Movements and Music Matching Based on Multifeature Fusion.

Early-stage lifetime prediction for lithium-ion batteries: A deep learning framework jointly considering machine-learned and handcrafted data features

Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches.

Root cause analysis of COVID-19 cases by enhanced text mining process

MPSAutodetect: A Malicious Powershell Script Detection Model Based on Stacked Denoising Auto-Encoder

Knowledge-driven feature engineering to detect multiple symptoms using ambulatory blood pressure monitoring data

AI Based Traffic Flow Prediction Model for Connected and Autonomous Electric Vehicles

Feature Engineering Process on Well Log Data for Machine Learning-Based Sagd Performance Prediction

A Lightweight Hybrid Intrusion Detection Framework using Machine Learning for Edge-Based IIoT Security

A Novel Capacity Estimation Method for Li-Ion Battery Cell by Applying Ensemble Learning to Extremely Sparse Significant Points

EEG based Drowsiness Prediction Using Machine Learning Approach

AFGSL: Automatic Feature Generation based on Graph Structure Learning

Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

Novel architecture with selected feature vector for effective classification of mitotic and non-mitotic cells in breast cancer histology images

Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features.

Diagnosis and prognosis of mental disorders by means of EEG and deep learning: a systematic mapping study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Feature Engineering Process Research Articles

Articles published on Feature Engineering Process

Analisa dan Perancangan Machine Learning Untuk Mendeteksi Kegagalan Job di Apache Spark

Improved Prediction of Body Mass Index in Real-World Administrative Healthcare Claims Databases.

Optimization Simulation of Dance Technical Movements and Music Matching Based on Multifeature Fusion.

Early-stage lifetime prediction for lithium-ion batteries: A deep learning framework jointly considering machine-learned and handcrafted data features

Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches.

Root cause analysis of COVID-19 cases by enhanced text mining process

MPSAutodetect: A Malicious Powershell Script Detection Model Based on Stacked Denoising Auto-Encoder

Knowledge-driven feature engineering to detect multiple symptoms using ambulatory blood pressure monitoring data

AI Based Traffic Flow Prediction Model for Connected and Autonomous Electric Vehicles

Feature Engineering Process on Well Log Data for Machine Learning-Based Sagd Performance Prediction

A Lightweight Hybrid Intrusion Detection Framework using Machine Learning for Edge-Based IIoT Security

A Novel Capacity Estimation Method for Li-Ion Battery Cell by Applying Ensemble Learning to Extremely Sparse Significant Points

EEG based Drowsiness Prediction Using Machine Learning Approach

AFGSL: Automatic Feature Generation based on Graph Structure Learning

Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

Novel architecture with selected feature vector for effective classification of mitotic and non-mitotic cells in breast cancer histology images

Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features.

Diagnosis and prognosis of mental disorders by means of EEG and deep learning: a systematic mapping study