Rule-based Classification Research Articles

Contextual variables that capture the characteristics of delimited geographic or jurisdictional areas are vital for health and social research. However, obtaining data sets with contextual-level data can be challenging in the absence of monitoring systems or public census data. We describe and implement an 8-step method that combines web scraping, text mining, and spatial overlay analysis (WeTMS) to transform extensive text data from government websites into analyzable data sets containing contextual data for jurisdictional areas. This tutorial describes the method and provides resources for its application by health and social researchers. We used this method to create data sets of health assets aimed at enhancing older adults' social connections (eg, activities and resources such as walking groups and senior clubs) across the 374 health jurisdictions in Catalonia from 2015 to 2022. These assets are registered on a web-based government platform by local stakeholders from various health and nonhealth organizations as part of a national public health program. Steps 1 to 3 involved defining the variables of interest, identifying data sources, and using Python to extract information from 50,000 websites linked to the platform. Steps 4 to 6 comprised preprocessing the scraped text, defining new variables to classify health assets based on social connection constructs, analyzing word frequencies in titles and descriptions of the assets, creating topic-specific dictionaries, implementing a rule-based classifier in R, and verifying the results. Steps 7 and 8 integrate the spatial overlay analysis to determine the geographic location of each asset. We conducted a descriptive analysis of the data sets to report the characteristics of the assets identified and the patterns of asset registrations across areas. We identified and extracted data from 17,305 websites describing health assets. The titles and descriptions of the activities and resources contained 12,560 and 7301 unique words, respectively. After applying our classifier and spatial analysis algorithm, we generated 2 data sets containing 9546 health assets (5022 activities and 4524 resources) with the potential to enhance social connections among older adults. Stakeholders from 318 health jurisdictions registered identified assets on the platform between July 2015 and December 2022. The agreement rate between the classification algorithm and verified data sets ranged from 62.02% to 99.47% across variables. Leisure and skill development activities were the most prevalent (1844/5022, 36.72%). Leisure and cultural associations, such as social clubs for older adults, were the most common resources (878/4524, 19.41%). Health asset registration varied across areas, ranging between 0 and 263 activities and 0 and 265 resources. The sequential use of WeTMS offers a robust method for generating data sets containing contextual-level variables from internet text data. This study can guide health and social researchers in efficiently generating ready-to-analyze data sets containing contextual variables.

Read full abstract

There are various paradigms of learning. One such paradigm is learning from knowledge assimilation. In machine learning, such a paradigm may be considered where concepts (classification models) generated from different algorithms applied to the same training data are combined to come up with a single integrated improved concept (classification model). This is different from combining outputs of various models as in Bagging or Boosting. Thus, learning from knowledge assimilation creates different alternative models that work on different parts of the search space. Such models can be separately optimized so that the best-performing model can be selected for the given problem. For optimization there are several algorithms and Chemical Reaction Optimization (CRO) is a chemistry-based nature-inspired optimization algorithm. In this algorithm, a set of optimization approaches are introduced corresponding to various types of chemical reactions. Classification models in our approach are optimized using a technique called Elitist CRO(ECRO). ECRO is an improvement over CRO, where the optimization approaches are modified making the convergence of the algorithm better than CRO. Although for classification CRO models based on neural networks exist, however, they suffer from interpretability. This paper proposes an application of a new Hybrid Elitist Chemical Reaction Optimization (HECRO) algorithm to generate interpretable classification rules. The proposed algorithm is an extension of CRO and ECRO, in the manner of region exploration of the search space, which is expanded by combining classification rules generated from various algorithms as the initial population following learning from knowledge assimilation. Experimental results with 20 datasets drawn from the UCI Machine Learning data repository show improvement over several performance measures in comparison to the base classifiers used for population generation in HECRO.

Read full abstract

Rule-based Classification Research Articles

Related Topics

Articles published on Rule-based Classification

Patient portal messages to support an age-friendly health system for persons with dementia.

An artificial intelligence platform for automated PFAS subgroup classification: A discovery tool for PFAS screening

Rule-based continuous line classification using shape and positional relationships between objects in piping and instrumentation diagram

A heuristic method for discovering multi-class classification rules from multi-source data in cloud–edge system

A process-aware framework to support Process Mining from blockchain applications

Rule-Based Text Classification of Dental Diagnosis.

Relation Detection to Identify Stroke Assertions from Clinical Notes Using Natural Language Processing.

Generating Contextual Variables From Web-Based Data for Health Research: Tutorial on Web Scraping, Text Mining, and Spatial Overlay Analysis.

Optimizing ant colony system algorithm with rule-based data classification for smart aquaculture

ANALYSIS ON CREATING MINIMUM CLASSIFICATION GUIDELINES FOR BREAST CANCER DIAGNOSIS THROUGH THE APPLICATION OF SUPPORT VECTOR MACHINE ALGORITHMS

Robust Epileptic Seizure Detection Based on Biomedical Signals Using an Advanced Multi-View Deep Feature Learning Approach.

FGRBC: A Novel Fuzzy Granular Rule-Based Classifier Using the Justifiable Granularity Principle and a Fusion Strategy

A Hybrid Metaheuristic Algorithm Using Elitist Chemical Reaction Optimization and Learning from Knowledge Assimilation for Improving Rule-based Classification Models

Fuzzy rule-based classification of complex biogas energy projects

On the Suitability of Fuzzy Rule-Based Classification Systems with Noisy Data

Golden Jackal Optimization with Neutrosophic Rule-Based Classification System for Enhanced Traffic Sign Detection

DIACRITIC-AWARE ALIGNMENT AND CLASSIFICATION IN ARABIC SPEECH: A FUSION OF FUZTPI AND ML MODELS

Identifying Old-Growth Forests in Complex Landscapes: A New LiDAR-Based Estimation Framework and Conservation Implications

Analyzing of iron-deficiency anemia in pregnancy using rule-based intelligent classification models

Data Mining Approach to Select Parameters of Swarm Intelligence Algorithms for Optimal Placement Reactive Power Compensation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Rule-based Classification Research Articles

Related Topics

Articles published on Rule-based Classification

Patient portal messages to support an age-friendly health system for persons with dementia.

An artificial intelligence platform for automated PFAS subgroup classification: A discovery tool for PFAS screening

Rule-based continuous line classification using shape and positional relationships between objects in piping and instrumentation diagram

A heuristic method for discovering multi-class classification rules from multi-source data in cloud–edge system

A process-aware framework to support Process Mining from blockchain applications

Rule-Based Text Classification of Dental Diagnosis.

Relation Detection to Identify Stroke Assertions from Clinical Notes Using Natural Language Processing.

Generating Contextual Variables From Web-Based Data for Health Research: Tutorial on Web Scraping, Text Mining, and Spatial Overlay Analysis.

Optimizing ant colony system algorithm with rule-based data classification for smart aquaculture

ANALYSIS ON CREATING MINIMUM CLASSIFICATION GUIDELINES FOR BREAST CANCER DIAGNOSIS THROUGH THE APPLICATION OF SUPPORT VECTOR MACHINE ALGORITHMS

Robust Epileptic Seizure Detection Based on Biomedical Signals Using an Advanced Multi-View Deep Feature Learning Approach.

FGRBC: A Novel Fuzzy Granular Rule-Based Classifier Using the Justifiable Granularity Principle and a Fusion Strategy

A Hybrid Metaheuristic Algorithm Using Elitist Chemical Reaction Optimization and Learning from Knowledge Assimilation for Improving Rule-based Classification Models

Fuzzy rule-based classification of complex biogas energy projects

On the Suitability of Fuzzy Rule-Based Classification Systems with Noisy Data

Golden Jackal Optimization with Neutrosophic Rule-Based Classification System for Enhanced Traffic Sign Detection

DIACRITIC-AWARE ALIGNMENT AND CLASSIFICATION IN ARABIC SPEECH: A FUSION OF FUZTPI AND ML MODELS

Identifying Old-Growth Forests in Complex Landscapes: A New LiDAR-Based Estimation Framework and Conservation Implications

Analyzing of iron-deficiency anemia in pregnancy using rule-based intelligent classification models

Data Mining Approach to Select Parameters of Swarm Intelligence Algorithms for Optimal Placement Reactive Power Compensation