Abstract
In our paper we intend to present a methodology that we elaborated for clustering texts based on the word fre quency in the English translations of selected old Greek texts. We used the classification system of the ancient Library of Alex andria, devised by the prominent Greek scholar-poet, Callima chus in the 3rd century BC., as a basis for categorizing literary masterpieces. In our content analysis, we could determine a tri plet of a, b, c values for describing a power function that appro priately fits a curve determined by the word frequencies in the texts. In addition, we have discovered 16 special features of the different texts that correspond to various token categories inves tigated in each text, such as part of speech of the word in the con text, numerals, subordinate conjunction, symbols, etc. We have developed a cognitive model in which several hundred different subtexts were utilized for supervised learning with the aim of subtext class recognition. Concerning 200 subtexts, the triplet of a, b, c values, the classes of the subtexts, and their 16-dimen sional feature vectors were learnt for the Recurrent Neural Net work (RNN). It turned out that the Long-Short Term Memory RNN could efficiently predict which class a chosen subtext could be categorized into without considering the interpretation of the content. The influence of the non-zero error rate of new com munication services on the meaning of the transferred texts was also investigated. The impact of the noise on the classification accuracy was found to be linear, dependent on the character error rate.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.