Abstract
Based on the two models, “bag-of-words” and graph model, the paper deals with the development of methods for automated text analysis with the purpose to classify natural language texts and randomly generated documents. Within “bag-of-words” model, the authors have found that the primary Zipf’s law, which states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table, does not hold continuously true. Modifications to this law have been proposed that enable us to classify texts more efficiently. Using the graph model of the text, which takes into account the occurrence of two random words in a sentence, and the median degree of the vertices of the graph, the authors demonstrate that it can be applied to differentiate meaningless texts from meaningful ones even though the word lists of the two texts are identical.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IOP Conference Series: Materials Science and Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.