Abstract

A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.

Highlights

  • IntroductionLanguage can be regarded as a complex system [1], where words are constituents which interact with each other to form particular patterns

  • Language is the human capability for communication via vocal or visual signs

  • One of the most well-known power laws is Zipf’s law, which shows that if we rank the words in a text from the most common to the least, the frequency of each word is inversely proportional to its rank [2]

Read more

Summary

Introduction

Language can be regarded as a complex system [1], where words are constituents which interact with each other to form particular patterns. Such patterns represent human thoughts, feelings, will, and knowledge which are called meaning. As the written form of language, inherit its complexity. Research has shown that regularity in a text can be expressed as a power law relationship. MenzerathAltmann law says there is a relation between size of a construct and size of its constituents. According to Menzerath-Altmann law, when the size of a construct increases, the size of its constituents decreases, and this holds at every level. The fractal dimension of a given text is the average value of fractal dimension of levels [8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call