Abstract

In order to accurately extract the useful information in English, this paper studies English text analysis combined with a genetic algorithm and establishes a text analysis system. In this method, a text tendency analysis algorithm based on a genetic algorithm language model is proposed, and a Doc2vec text feature representation algorithm integrating the LDA model is designed; the parallelization technology of text analysis algorithm is studied, and the parallelization model of the algorithm by using spark big data platform is designed; the process of English text tendency analysis is studied, and a Chinese text analysis system is designed and implemented based on big data platform, including corpus intake, corpus annotation, corpus storage, model training, model verification, and other modules. In order to verify the feasibility of this subject, the accuracy of the Doc2vec text feature representation algorithm of the fused LDA model designed in the prototype system is tested. The experimental results show that the fused text representation model has high recognition degree, and the AUC value of the ROC curve reaches 0.95. At the same time, this paper tests the text analysis-related algorithms involved in the system. The test results show that the parallel algorithm can greatly improve the efficiency of the system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.