Abstract

The feature ranking as a subcategory of the feature selection is an essential preprocessing technique that ranks all features of a dataset such that many important features denote a lot of information. The ensemble learning has two advantages. First, it has been based on the assumption that combining different model’s output can lead to a better outcome than the output of any individual models. Second, scalability is an intrinsic characteristic that is so crucial in coping with a large scale dataset. In this paper, a homogeneous ensemble feature ranking algorithm is considered, and the nine rank fusion methods used in this algorithm are analyzed comparatively. The experimental studies are performed on real six medium datasets, and the area under the feature-forward-addition curve criterion is assessed. Finally, the statistical analysis by repeated-measures analysis of variance results reveals that there is no big difference in the performance of the rank fusion methods applied in a homogeneous ensemble feature ranking; however, this difference is a statistical significance, and the B-Min method has a little better performance.

Highlights

  • During recent years, the amount of data generated daily has grown dramatically

  • Scientific Programming e ensemble learning broadly has been applied in the classification discipline in the last decade; its effectiveness is imaginable in other machine learning disciplines such as feature selection as well [12]. e ensemble learning approach for feature selection technique, which is called Ensemble Feature Selection (EFS), has received increased attention in recent years [13,14,15,16,17]

  • The ensemble learning broadly has been applied in the classification discipline in the last decade; its effectiveness is imaginable in other machine learning disciplines such as feature ranking as well

Read more

Summary

Introduction

The amount of data generated daily has grown dramatically. IBM estimated that every day 2.5 Quintillion bytes of data is created, and 90% of the data in the world today has been created in the last two years. Nowadays, such voluminous data are known as big data. E analysis of such massive data on a single machine is impossible or very slow and time-consuming. Feature selection (FS) is a crucial preprocessing technique to deal with high-dimensional datasets that are common in the big data era. According to the final result, feature selection techniques can be categorized into two subcategories: feature-subset selection (FSS) and feature ranking (FR). Depending on whether the label of each instance is available or not, the feature selection can be classified into supervised and unsupervised types [5,6,7,8,9]

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call