A Skyline-Based Decision Boundary Estimation Method for Binominal Classification in Big Data

Christos Kalyvas,Manolis Maragoudakis

doi:10.3390/computation8030080

Abstract

One of the most common tasks nowadays in big data environments is the need to classify large amounts of data. There are numerous classification models designed to perform best in different environments and datasets, each with its advantages and disadvantages. However, when dealing with big data, their performance is significantly degraded because they are not designed—or even capable—of handling very large datasets. The current approach is based on a novel proposal of exploiting the dynamics of skyline queries to efficiently identify the decision boundary and classify big data. A comparison against the popular k-nearest neighbor (k-NN), support vector machines (SVM) and naïve Bayes classification algorithms shows that the proposed method is faster than the k-NN and the SVM. The novelty of this method is based on the fact that only a small number of computations are needed in order to make a prediction, while its full potential is revealed in very large datasets.

Highlights

The increased amount of high-volume, high-velocity, high-variety and high-veracity data produced in the last decade has created the need to develop cost-effective techniques to manage them, which fall under the term big data [1]
Single curve with Polynomial Curve-fitting: Throughout our experimental phase, we observed that many and in some cases even all of the skyline points are a subset of the support vectors used by the final support vector machines (SVM) (Figure 8a)
For the synthetic datasets that consist of 1 M points the naïve Bayes approach finished in less than a second, the SVM took several minutes (Table 2 with time in milliseconds) and the k-nearest neighbor (k-NN) did not finish in a reasonable time

Summary

Introduction

The increased amount of high-volume, high-velocity, high-variety and high-veracity data produced in the last decade has created the need to develop cost-effective techniques to manage them, which fall under the term big data [1]. ML methods have reached a point at which we can combine even a set of weak classifiers using ensemble learning techniques [12] to produce good results With this in mind, each time a new classifier is proposed, questions arise if we really need one more [13]. Each time a new classifier is proposed, questions arise if we really need one more [13] Even with these techniques, it is not always feasible to perform a classification task with low processing costs in a big data environment, since traditional classification algorithms are designed primarily to achieve exceptional accuracy with tradeoffs between space or time complexity. The problem is equivalent to the maximal vector problem [17] To our knowledge, this is the first work that tries to harvest the power of skyline queries in a classification process for big data.

Background and Related Work

Skyline Query Family

Applications of Skyline Queries

Big Data

Preliminaries

Skyline dataset in in Table

Cardinality

Define the Origin Points

Identifying Skyline Points

Decision

Classification Task

Experiments

Synthetic constructed to have a large number of points in order to

Method

Synthetic Dataset II

Method kk

16. Polynomial curve-fitting on on the the synthetic synthetic Dataset

Synthetic

11. Dataset

Real Dataset

Future Work

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Skyline-Based Decision Boundary Estimation Method for Binominal Classification in Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation

Lead the way for us

Journal: Computation	Publication Date: Sep 10, 2020
License type: CC BY 4.0

Similar Papers

A Skyline-based Decision Boundary Estimation Method for Binominal Classification in Big Data
Christos Kalyvas ... Manolis Maragoudakis
-
Christos Kalyvas, et. al.Christos Kalyvas ... Manolis Maragoudakis
01 Sep 2020
01 Sep 2020

Parallel SVM Algorithms in Big Data Environments
Shuai Zhang
-
Shuai ZhangShuai Zhang
18 Nov 2022
18 Nov 2022

An Efficient Anomaly Detection Based On Optimal Deep Belief Network in Big Data
Priyanka Dahiya ... Devesh Kumar Srivastva
International Journal of Engineering and Advanced Technology | VOL. 9
Priyanka Dahiya, et. al.Priyanka Dahiya ... Devesh Kumar Srivastva
30 Oct 2019
International Journal of Engineering and Advanced Technology | VOL. 9

Statistical Approaches to Machine Learning in a Big Data Environment
Zhuoyong Liu
Highlights in Science, Engineering and Technology | VOL. 92
Zhuoyong LiuZhuoyong Liu
10 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Skyline-Based Decision Boundary Estimation Method for Binominal Classification in Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation