Abstract

In engineering practice, we need to consult a large number of professional data to support theoretical innovation. If we rely on manual classification and screening one by one, it will take a lot of time and energy, and the classification accuracy can not meet the requirements. In order to improve the work efficiency, a multi-level feature selection algorithm based on MapReduce is proposed. The improved Chi feature selection algorithm can be used for the initial screening, and then the noise words and pre quality features can be filtered by mutual information method. The experimental results show that the algorithm not only ensures the low time complexity of processing big data, but also improves the accuracy of text classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.