Abstract

Feature selection is one of the most significant steps in machine learning that reduces the features space in order to achieve faster learning and yielding simpler models with high accuracy and interpretability. With rapid development in technologies, large scale high dimensional datasets are common today which degrades the performance of traditional feature selection techniques as they suffer with the scalability issues. Parallel feature selection is an obvious solution to deal with this problem. Due to advent of many distributed computing frameworks scalable computing has become a viable strategy in reference to feature selection. Present work proposes a distributed parallel feature selection technique that employs vertical distribution strategy for dataset to exploit parallel computation. It uses information gain filter based ranking method which evaluates multiple disjoint feature subsets of dataset in parallel. The key idea is the distribution of evaluation and rank generation of features over several computing nodes in parallel. Experiments are performed on multiple large scale and high dimension datasets and significant reduction in overall computation time is achieved.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.