Abstract
Learning is the heart of intelligence. The focus in machine learning is to automate methods that achieve objectives, improve predictions or encourage informed behavior. Feature selection is a vital step in data analysis that often reduces dataset dimensionality by eliminating irrelevant and/or redundant attributes to simplify the learning process or improve outcomes’ quality. This research critically analyses different filter methods based on ranking procedures (Information Gain (IG), Chi-square (CHI), V-score, Fisher Score, mRMR, Va and ReliefF) and identifies possible challenges that arise. We particularly concentrate on how threshold determination can affect results of different filter methods based on ranked scores. We show that this issue is vital, especially in the era of big data in which users deal with attributes in the magnitudes of tens of thousands with only a limited number of instances.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have