Abstract
In recent years, data volume is getting larger along with the fast development of Internet technologies. Some datasets contain a huge number of labels, dimensions and data points. As a result, some of them cannot be loaded by typical classifiers, and some of them require very long and unacceptable time for execution. Extreme multi-label classification is designed for these challenges. Extreme multi-label classification differs from traditional multi-label classification in a number of ways including the need for lower execution time, training at an extreme scale with millions of data points, features and labels, etc. In order to enhance the practicality, in this paper, we focus on designing an extreme multi-label classification approach which can be performed on a single person-al computer. We devise a two-phase framework for dealing with the above issues. In the reweighting phase, the prediction precision is improved by paying more attention on hard-to-classify instances and increasing the diversity of the model. In the pretesting phase, trees with lower quality will be removed from the prediction model for reducing the model size and increasing the prediction precision. Experiments on real world datasets will verify that the pro-posed method is able to generate better prediction results and the model size is successfully shrunk down.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.