Abstract

In view of the poor classification effect of traditional random forest algorithm due to the low quality of text feature extraction, a random forest method for text information is proposed. In view of the difficulty in controlling the quality of traditional random forest decision trees, a weighted voting mechanism is proposed to improve the quality of decision trees. This algorithm uses tr-k method based on text feature extraction to improve the quality and diversity of text features, and uses the latest Bert word vector generation model to represent the text. Experimental data in Python environment show that this method can achieve better results in text classification than IDF based random forest algorithm and original random forest algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call