Generated Data Points Research Articles

BackgroundLarge-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness. ObjectiveThis paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2). MethodsThe syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism. ResultsThe open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices. ConclusionThe construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.

Read full abstract

The classification in imbalanced datasets is one of the main problems for machine learning techniques. Support vector machine (SVM) is biased to the majority class samples, and the minority class samples may incorrectly be considered as noise. Therefore, SVM has poor predictive accuracy for imbalanced datasets and generates inaccurate classification models. Existing class imbalance learning (CIL) techniques can make SVM less sensitive to class imbalance, but these methods suffer from issues related to noise and outliers. Moreover, despite the solid theoretical basis and good classification performance, SVM is not appropriate for the classification of large-scale datasets because the training complexity of SVM is closely related to the dataset size. Class imbalance learning (CIL) using Fuzzy adaptive resonance theory (ART) and intuitionistic fuzzy twin SVM (CIL-FART-IFTSVM), which can be applied to address the class imbalance issue in the presence of noise and outliers and large scale datasets, is proposed to overcome these substantial difficulties. In this method, we modify the distribution of the datasets using fuzzy adaptive resonance theory (Fuzzy ART) as a clustering method to overcome the imbalance problem. Then, after data reduction, IFTSVM is utilized to find excellent non-parallel hyperplanes in the generated data points. Finally, a coordinate descent system with shrinking by an active set is applied to reduce the computational complexity. Forty-five imbalanced datasets are considered to validate the performance of the proposed CIL-FART-IFTSVM method. The Friedman test and the bootstrap technique with 95% confidence intervals are applied to quantify the results statistically. The experimental results indicate that the method proposed in this paper has a better performance compared with other methods, and the training time is significantly better than that of other classifiers for large-scale datasets.

Read full abstract

Generated Data Points Research Articles

Articles published on Generated Data Points

A criterion for selecting the appropriate one from the trained models for model‐based offline policy evaluation

HERALD: A domain-specific query language for longitudinal health data analytics

Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data

Learning Deep Generative Clustering via Mutual Information Maximization.

Online Symbolic Regression with Informative Query

Adversarial Perturbation Elimination with GAN Based Defense in Continuous-Variable Quantum Key Distribution Systems

TLMOTE: A Topic-based Language Modelling Approach for Text Oversampling

Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples.

A Critical Evaluation and Modification of the Padé-Laplace Method for Deconvolution of Viscoelastic Spectra.

Class imbalance learning using fuzzy ART and intuitionistic fuzzy twin support vector machines

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Experimental and numerical study of hot-rolled duplex stainless steel CHS columns

Data driven robust optimization of grinding process under uncertainty

Global temperature calibration of the Long chain Diol Index in marine surface sediments

Modeling of overall heat transfer coefficient of a concentric double pipe heat exchanger with limited experimental data by using curve fitting and ANN combination

Streaming Synthetic Time Series for Simulated Condition Monitoring

Improve the robustness of data mining algorithm against adversarial evasion attack

Analysis of adsorption kinetics at solid/solution interface using a hyperbolic tangent model

Imbalanced data classification via support vector machines and genetic algorithms

False positives with visual analysis for nonconcurrent multiple baseline designs and ABAB designs: Preliminary findings

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Generated Data Points Research Articles

Articles published on Generated Data Points

A criterion for selecting the appropriate one from the trained models for model‐based offline policy evaluation

HERALD: A domain-specific query language for longitudinal health data analytics

Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data

Learning Deep Generative Clustering via Mutual Information Maximization.

Online Symbolic Regression with Informative Query

Adversarial Perturbation Elimination with GAN Based Defense in Continuous-Variable Quantum Key Distribution Systems

TLMOTE: A Topic-based Language Modelling Approach for Text Oversampling

Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples.

A Critical Evaluation and Modification of the Padé-Laplace Method for Deconvolution of Viscoelastic Spectra.

Class imbalance learning using fuzzy ART and intuitionistic fuzzy twin support vector machines

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Experimental and numerical study of hot-rolled duplex stainless steel CHS columns

Data driven robust optimization of grinding process under uncertainty

Global temperature calibration of the Long chain Diol Index in marine surface sediments

Modeling of overall heat transfer coefficient of a concentric double pipe heat exchanger with limited experimental data by using curve fitting and ANN combination

Streaming Synthetic Time Series for Simulated Condition Monitoring

Improve the robustness of data mining algorithm against adversarial evasion attack

Analysis of adsorption kinetics at solid/solution interface using a hyperbolic tangent model

Imbalanced data classification via support vector machines and genetic algorithms

False positives with visual analysis for nonconcurrent multiple baseline designs and ABAB designs: Preliminary findings