Information Gained Research Articles

In supervised classification, decision trees are one of the most popular learning algorithms that are employed in many practical applications because of their simplicity, adaptability, and other perks. The development of effective and efficient decision trees remains a major focus in machine learning. Therefore, the scientific literature provides various node splitting measurements that can be utilized to produce different decision trees, including Information Gain, Gain Ratio, Average Gain, and Gini Index. This research paper presents a new node splitting metric that is based on preordonance theory. The primary benefit of the new split criterion is its ability to deal with categorical or numerical attributes directly without discretization. Consequently, the Preordonance-based decision tree” (P-Tree) approach, a powerful technique that generates decision trees using the suggested node splitting measure, is developed. Both multiclass classification problems and imbalanced data sets can be handled by the P-Tree decision tree strategy. Moreover, the over-partitioning problem is addressed by the P-Tree methodology, which introduces a threshold ϵ as a stopping condition. If the percentage of instances in a node falls below the predetermined threshold, the expansion of the tree will be halted. The performance of the P-Tree procedure is evaluated on fourteen benchmark data sets with different sizes and contrasted with that of five already existing decision tree methods using a variety of evaluation metrics. The results of the experiments demonstrate that the P-Tree model performs admirably across all of the tested data sets and that it is comparable to the other five decision tree algorithms overall. On the other hand, an ensemble technique called “ensemble P-Tree” offers a reliable remedy to mitigate the instability that is frequently associated with tree-based algorithms. This ensemble method leverages the strengths of the P-Tree approach to enhance predictive performance through collective decision-making. The ensemble P-Tree strategy is comprehensively evaluated by comparing its performance to that of two top-performing ensemble decision tree methodologies. The experimental findings highlight its exceptional performance and competitiveness against other decision tree procedures. Despite the excellent performance of the P-Tree approach, there are still some obstacles that prevent it from handling larger data sets, such as memory restrictions, time complexity, or data complexity. However, parallel computing is effective in resolving this kind of problem. Hence, the MR-P-Tree decision tree technique, a parallel implementation of the P-Tree strategy in the Map-Reduce framework, is further designed. The three parallel procedures MR-SA-S, MR-SP-S, and MR-S-DS for choosing the optimal splitting attributes, choosing the optimal splitting points, and dividing the training data set in parallel, respectively, are the primary basis of the MR-P-Tree methodology. Furthermore, several experimental studies are carried out on ten additional data sets to illustrate the viability of the MR-P-Tree technique and its strong parallel performance.

Read full abstract

Material types of asteroids provide key clues to their evolutionary history and contained resources. The Gaia mission has released extensive low-resolution spectral observation data of small Solar System bodies. However, methods for classifying asteroids based on low-resolution space-based spectra are still inadequate, and do not fully leverage the complementary features of spectra and multiple intrinsic attributes of asteroids to achieve precise material classification. Our goal is to propose a method with a higher generalization accuracy for asteroid material classification by integrating multi-source information, identifying optimal feature combinations for model inputs, and deepening the understanding of relationships among asteroid parameters. The effective asteroid photometric, physical, and orbital parameters were screened using the information gain ratio and Spearman's rank correlation coefficient. Then, artificial intelligence techniques were employed to combine asteroid spectra with the selected various parameters for six-class material classification. By comparing five machine learning models, we identified network structures with higher validation accuracy and stable generalization performance. Meanwhile, feature ablation experiments were conducted to determine the input parameter combinations suitable for different scenarios. Finally, based on the statistical results and model outputs, the constraint relationships among asteroid parameters were visualized and analyzed. The proposed AsterRF model achieved a validation accuracy of 92.2<!PCT!>, an improvement of approximately 7.8 percentage points compared to existing methods that use only spectra. V-type asteroids exhibited the highest classification accuracy, followed by A-type and D-type. X-type asteroids had the lowest precision and recall, and were easily confused with C-type. The model generally showed higher classification confidence for S-type asteroids. The top five attributes that the model focused on are the phase slope parameter (G), orbital type, albedo, H magnitude, and effective diameter. Additionally, the correlations between asteroid materials and other parameters were generally below 0.4. Incorporating optimal asteroid parameter combinations can significantly enhance classification accuracy based on spectra. A dual-channel network that processes spectra and parameter inputs separately, and employs a self-attention mechanism for feature fusion is effective in combining multi-source asteroid information. Both the statistical correlations and model performance-based importance rankings of parameters contribute to understanding the constraint relationships among asteroid attributes.

Read full abstract

Information Gained Research Articles

Articles published on Information Gained

Effect of second-order network structure on link prediction

A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce

Comparative Analysis of ML Models with Selection Methods for Early Predictive Analytics of Sepsis in ICU

A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method

Quantum-inspired Attribute Selection Algorithms

Diagnosis of heart disease using an advanced triple hybrid algorithm combining machine learning techniques

Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems

A focus group study of students’ expectations of digital onboarding tools in higher education

Improving Fault Classification Accuracy Using Wavelet Transform and Random Forest with STATCOM Integration

Adaptive fuzzy neighborhood decision tree

Asteroid material classification based on multi-parameter constraints using artificial intelligence

EMSIG: Uncovering Factors Influencing COVID-19 Vaccination Across Different Subgroups Characterized by Embedding-Based Spatial Information Gain

Optimal experimental design for identification of hydrodynamic loading models

Research and performance analysis of random forest-based feature selection algorithm in sports effectiveness evaluation

Experimental of information gain and AdaBoost feature for machine learning classifier in media social data

Optimizing prediction of stainless steel mechanical properties with random forest: a comparison of feature selection methods

Less is More: Unlocking Semi-Supervised Deep Learning for Vulnerability Detection

Machine learning for air quality index (AQI) forecasting: shallow learning or deep learning?

Information-guided adaptive learning approach for active surveillance of infectious diseases

Assessment of land degradation susceptibility within the Shaqlawa subregion of Northern Iraq-Kurdistan Region via synergistic application of remotely acquired datasets and advanced predictive models.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Gained Research Articles

Articles published on Information Gained

Effect of second-order network structure on link prediction

A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce

Comparative Analysis of ML Models with Selection Methods for Early Predictive Analytics of Sepsis in ICU

A New Breast Cancer Discovery Strategy: A Combined Outlier Rejection Technique and an Ensemble Classification Method

Quantum-inspired Attribute Selection Algorithms

Diagnosis of heart disease using an advanced triple hybrid algorithm combining machine learning techniques

Model-Based Sequential Design of Experiments with Machine Learning for Aerospace Systems

A focus group study of students’ expectations of digital onboarding tools in higher education

Improving Fault Classification Accuracy Using Wavelet Transform and Random Forest with STATCOM Integration

Adaptive fuzzy neighborhood decision tree

Asteroid material classification based on multi-parameter constraints using artificial intelligence

EMSIG: Uncovering Factors Influencing COVID-19 Vaccination Across Different Subgroups Characterized by Embedding-Based Spatial Information Gain

Optimal experimental design for identification of hydrodynamic loading models

Research and performance analysis of random forest-based feature selection algorithm in sports effectiveness evaluation

Experimental of information gain and AdaBoost feature for machine learning classifier in media social data

Optimizing prediction of stainless steel mechanical properties with random forest: a comparison of feature selection methods

Less is More: Unlocking Semi-Supervised Deep Learning for Vulnerability Detection

Machine learning for air quality index (AQI) forecasting: shallow learning or deep learning?

Information-guided adaptive learning approach for active surveillance of infectious diseases

Assessment of land degradation susceptibility within the Shaqlawa subregion of Northern Iraq-Kurdistan Region via synergistic application of remotely acquired datasets and advanced predictive models.