Big Data Classification Problems Research Articles

The massive increase of information in the big data era has not only created data processing problems, but also the data security issues. These big data cyber security issues can be handled effectively using machine learning algorithms among which the Support Vector Machines (SVM) has better results on big data classification problems. Defining the proper configuration of the SVM requires expert knowledge in selecting the kernel function and other parameters and this can significantly improve its classification results. In this paper, the SVM configuration process is modelled as a multi-objective optimization problem by considering the false positive rate, false negative rate and model complexity parameters. A Hyper-Heuristic Improved Particle Swarm Optimization (HHIPSO) framework is developed to optimize the SVM multi-objective optimization problem by incorporating the hyper-heuristics and improved particle swarm optimization algorithm. The proposed hyper-heuristic framework includes the high-level strategy for controlling the selection of low-level heuristics by search process and the low-level heuristics generate the new SVM configuration solutions using different rules of PSO. The effective selection of the kernel function and the respective parameters of the SVM should result in better values of false positive rate and false negative rate and also reduce the complexity. The evaluation of the proposed HHIPSO is performed on two cyber security problems and the obtained results illustrated that the proposed approach is effective in improving the classification of big data cyber security problems than the other algorithms.

Interpretability has always been a major concern for fuzzy rule-based classifiers. The usage of human-readable models allows them to explain the reasoning behind their predictions and decisions. However, when it comes to Big Data classification problems, fuzzy rule-based classifiers have not been able to maintain the good trade-off between accuracy and interpretability that has characterized these techniques in non-Big Data environments. The most accurate methods build too complex models composed of a large number of rules and fuzzy sets, while those approaches focusing on interpretability do not provide state-of-the-art discrimination capabilities. In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data. This method has been specifically designed from scratch for Big Data problems and does not adapt or extend any existing algorithm. The proposed learning process consists of three stages: 1) pre-processing based on the probability integral transform theorem; 2) rule induction inspired by CHI-BD and Apriori algorithms; 3) rule selection by means of a global evolutionary optimization. We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime. The results obtained were compared and contrasted with four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS, and CHI-BD). According to this study, CFM-BD is able to provide competitive discrimination capabilities using significantly simpler models composed of a few rules of less than 3 antecedents, employing 5 linguistic labels for all variables.

Big Data Classification Problems Research Articles

Related Topics

Articles published on Big Data Classification Problems

DK-MS: an efficient method for solving imbalanced Big Data classification problems

Distributed independent vector machine for big data classification problems

Classification method for imbalanced LiDAR point cloud based on stack autoencoder

Improved cost-sensitive representation of data for solving the imbalanced big data classification problem

Using Pyspark Environment for Solving a Big Data Problem: Searching for Supersymmetric Particles

FUZZ-EQ: A data equalizer for boosting the discrimination power of fuzzy classifiers

Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data

Multi-Objective Hyper-Heuristic Improved Particle Swarm Optimization Based Configuration of Support Vector Machines for Big Data Cyber Security

CFM-BD: A Distributed Rule Induction Algorithm for Building Compact Fuzzy Models in Big Data Classification Problems

Enabling Smart Data: Noise filtering in Big Data classification

A Micro-Extended Belief Rule-Based System for Big Data Multiclass Classification Problems

Machine learning materials physics: Surrogate optimization and multi-fidelity algorithms predict precipitate morphology in an alternative to phase field dynamics

FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification

A scalable and distributed dendritic cell algorithm for big data classification

A New MapReduce Approach with Dynamic Fuzzy Inference for Big Data Classification Problems

CHI-PG: A fast prototype generation algorithm for Big Data classification problems

CHI-BD: A fuzzy rule-based classification system for Big Data classification problems

Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles

A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules

MRPR: A MapReduce solution for prototype reduction in big data classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Big Data Classification Problems Research Articles

Related Topics

Articles published on Big Data Classification Problems

DK-MS: an efficient method for solving imbalanced Big Data classification problems

Distributed independent vector machine for big data classification problems

Classification method for imbalanced LiDAR point cloud based on stack autoencoder

Improved cost-sensitive representation of data for solving the imbalanced big data classification problem

Using Pyspark Environment for Solving a Big Data Problem: Searching for Supersymmetric Particles

FUZZ-EQ: A data equalizer for boosting the discrimination power of fuzzy classifiers

Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data

Multi-Objective Hyper-Heuristic Improved Particle Swarm Optimization Based Configuration of Support Vector Machines for Big Data Cyber Security

CFM-BD: A Distributed Rule Induction Algorithm for Building Compact Fuzzy Models in Big Data Classification Problems

Enabling Smart Data: Noise filtering in Big Data classification

A Micro-Extended Belief Rule-Based System for Big Data Multiclass Classification Problems

Machine learning materials physics: Surrogate optimization and multi-fidelity algorithms predict precipitate morphology in an alternative to phase field dynamics

FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification

A scalable and distributed dendritic cell algorithm for big data classification

A New MapReduce Approach with Dynamic Fuzzy Inference for Big Data Classification Problems

CHI-PG: A fast prototype generation algorithm for Big Data classification problems

CHI-BD: A fuzzy rule-based classification system for Big Data classification problems

Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles

A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules

MRPR: A MapReduce solution for prototype reduction in big data classification