Abstract

Feature selection is a vital dimensionality reduction technology for machine learning and data mining that aims to select a minimal subset from the original feature space. Traditional feature selection methods assume that all features can be required before learning, while features may exist in a stream mode for some real-world applications. Therefore, online streaming feature selection was proposed to handle streaming features on the fly. When the feature dimension is extraordinarily high or even infinite, it is time-consuming or impractical to wait for all the streaming features to arrive. Motivated by this, we study and solve the exciting issue of whether we can terminate the online streaming feature selection early for efficiency while maintaining satisfactory performance for the first time. Specifically, we first formally define the problem of online early terminated streaming feature selection and summary two properties that the early terminated mapping function should satisfy. Then we choose the dependency degree function in Rough Set theory as our early terminated mapping function and demonstrate that it satisfies the two properties. Based on this, we propose a novel Early Terminated Online Streaming Feature Selection framework, named OSFS-ET, which could terminate the streaming feature selection early before the end of streaming features and guarantee a competing performance with the currently selected features. Extensive experiments on twelve real-world datasets demonstrate that OSFS-ET can be far faster than state-of-the-art streaming feature selection methods while maintaining excellent performance on predictive accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.