Mutual Information-based Feature Selection Approach to Reduce High Dimension of Big Data

Thee Zin Win,Nang Saing Moon Kham

doi:10.1145/3278312.3278316

Thee Zin Win, Nang Saing Moon Kham

https://doi.org/10.1145/3278312.3278316

Copy DOI

Export

Save

Cite

Publication Date: Sep 28, 2018

Citations: 4

Affiliation: University of Computer Studies Yangon

Abstract
Full-Text
Similar Papers

Abstract

Listen

As increasing the massive amount of data demands effective and efficient mining strategies, practitioners and researchers are trying to develop scalable mining algorithms, machine learning algorithms and strategies to be successful data mining in turning mountains of data into nuggets. Data of high dimension significantly increases the memory storage requirements and computational costs for data analytics. Therefore, reducing dimension can mainly improve three data mining performance: speed of learning, predictive accuracy and simplicity and comprehensibility of mined result. Feature selection, data preprocessing technique, is effective and efficient in data mining, data analytics and machine learning problems particularly in high dimension reduction. Most feature selection algorithms can eliminate only irrelevant features but redundant features. Not only irrelevant features but also redundant features can degrade learning performance. Mutual information measured feature selection is proposed in this work to remove both irrelevant and redundant features.

Full Text