Abstract
Bilingual topic detection is a vital application of natural language processing in the Internet plus Era and trend of economic globalization. At present, the method of bilingual topic detection can’t solve the problem of bilingual topic inconsistent distribution. Aiming at the shortcoming, this paper introduces a based on maximal clique method to find bilingual topic detection of Chinese and Thai feature words. First of all, extract the information of news with keywords of each Chinese and Thai documents through the TextRank algorithm. Next, disambiguate by means of the similarity combined with Chinese and Thai dictionary. Then, use credible association rules to cluster Chinese and Thai feature words, which generates maximal clique of bilingual topic. Finally, cluster similar maximal clique of topic to obtain the collection of final topic. According to the needs of users, the method can recommend a bilingual topic of different sizes. The test of Chinese and Thai news texts in January 2016 made good achievement. From the perspective of cross-language word clustering, the algorithm effectively solves the problem of inconsistency of bilingual topic distribution reasonably, and has the advantages of no need to estimate the number of topics and low time complexity, so it is suitable for the application of online discovery in ilingual topic.
Highlights
With the development of information technology, Internet has become an important channel for organizations or individuals to understand the trends of other countries
In order to solve the above problems, this paper found a method of Chinese-Thai bilingual topic detection online base on maximal clique clustering
This paper addressed the statistical problem of the Chinese-Thai feature word ’ s co-occurrence relationship by Chinese-Thai bilingual dictionary and the disambiguation algorithm of Chinese candidate translation similarity
Summary
With the development of information technology, Internet has become an important channel for organizations or individuals to understand the trends of other countries. In order to solve the above problems, this paper found a method of Chinese-Thai bilingual topic detection online base on maximal clique clustering. First of all, this method adopts TextRank algorithm [6] to extract keywords of Chinese-Thai news text, combination with named entities and the first paragraph information of topic generated feature word sequence of news. The experiment by Chinese-Thai news corpus of January 2016, compared the different generation ways of maximum clique topic and the comparison results with other methods of bilingual topic detection. The experimental results show that the time efficiency of the method is the best, and the F value is 69.03%, which is achieved by the method of this paper
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have