Abstract

Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic.

Highlights

  • With the development of information technology, Internet has become an important channel for organizations or individuals to understand the trends of other countries

  • In order to solve the above problems, this paper found a method of Chinese-Thai bilingual topic detection online base on maximal clique clustering

  • This paper addressed the statistical problem of the Chinese-Thai feature word ’ s co-occurrence relationship by Chinese-Thai bilingual dictionary and the disambiguation algorithm of Chinese candidate translation similarity

Read more

Summary

Introduction

With the development of information technology, Internet has become an important channel for organizations or individuals to understand the trends of other countries. In order to solve the above problems, this paper found a method of Chinese-Thai bilingual topic detection online base on maximal clique clustering. First of all, this method adopts TextRank algorithm [6] to extract keywords of Chinese-Thai news text, combination with named entities and the first paragraph information of topic generated feature word sequence of news. The experiment by Chinese-Thai news corpus of January 2016, compared the different generation ways of maximum clique topic and the comparison results with other methods of bilingual topic detection. The experimental results show that the time efficiency of the method is the best, and the F value is 69.03%, which is achieved by the method of this paper

The cross-language topic detection online process
Chinese-Thai feature words similarity disambiguation algorithm
The Chinese-Thai mapping dictionary of one-to-many
The semantic similarity translation of Chinese-Thai words
The construction of two-item credible set of Chinese-Thai feature words
Unearthing the maximum clique of topic feature words
Personalized recommendation of topics
Experiments and results
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call