Abstract

With the development of Internet and e-commerce, many product reviews have become an important source for collecting user opinions and improving product quality. The result of feature extraction is the basis of text sentiment analysis, which directly affects the accuracy of data mining results. In text feature extraction, the mutual information method has become an important text feature selection method with its low time complexity. However, the method does not consider the difference of the feature terms in frequency, nor does it consider the difference of the distribution of feature items in the same category, and because the candidate feature matrix is too large, its accuracy in practical applications is low, and the effect is poor. It takes a long time. Therefore, this paper firstly performs feature reduction by TF-IDF method, and then based on mutual information method, by introducing relative word frequency factor and combining the weight of feature items, the mining process is adaptively improved, which makes the frequency information of feature items effectively used in mutual information model, and the shortcomings of the mutual information model in text feature selection are reasonably improved. In turn, the accuracy of the algorithm is improved, making it more efficient for feature selection. The experimental results show that the optimization algorithm has better feature selection than the traditional mutual information feature mining algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.