Abstract

Studies in the area of language lexicography are focused on automatic dictionary creation. In this article, an English document is given as an initial reference. In the study, meaningful words representing the reference document were identified. For this purpose, the Helmholtz Principle has been applied. The first dictionary words consist of the meaningful words of the reference document we call this seed. Then, with a loop, Web search is performed in the Azure Web Cognitive Web Search system using meaningful words from the most recently processed document. The first document from the search result has meaningful words with the Helmholtz Principle as applied to the reference document. The meaningful words found during the cycle are not added directly to the dictionary this time, and using the WordNet dictionary to avoid deviations, the similarity of each meaningful word with the dictionary formed is calculated. The meaningful words with similarity values higher than a certain threshold value are added to the dictionary and the search cycle is repeated using these words, and finally, when the desired number of words for the dictionary is reached, ends. In order to measure the performance of the dictionary, WordNet similarity calculation was used. Dictionaries with an average of % 38,93 similarity can be generated in tests performed with reference documents given in different subjects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call