Abstract

As the size of text documents based on cloud storage increases, the time and cost of string search and keyword search increase. However, when searching for words or sentences in documents, most string search algorithms do not take the lexical structure used in the real world, or the constitutional characteristics of the character, into account. In particular, the previous string search algorithms have not considered well-formatted official document (articles, news, novels, academic papers, patents, etc.) characteristics of a limited number of characters and composition. In this paper, we propose a vowel-oriented binary tree that considers the probability of the occurrence of a character in real world documents and its compositional characteristics in well-formatted documents and well-formatted words. Based on the vowel-oriented binary tree, we propose a vowel-centered string search algorithm that searches for a specific word in a document. Based on several dictionaries (Free Dictionary Project Dictionary, Scrabble Helper), the frequency and pattern of occurrence of vowels and consonants were analyzed. A strategy and an algorithm for constructing a vowel-oriented binary tree that can express the frequency and probability patterns of the occurrence of vowels are proposed. The vowel-oriented binary tree is reconstructed according to the characteristics of the occurrence of vowels, and the consonants existing between vowels are distinguished and expressed. In addition, based on the vowel-oriented binary tree, we propose an enhanced vowel-oriented string search algorithm that quickly searches for words that can occur in real world documents.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call