Abstract

The string matching problem is considered as one of the most interesting research areas in the computer science field because it can be applied in many essential different applications such as intrusion detection, search analysis, editors, internet search engines, information retrieval and computational biology. During the matching process two main factors are used to evaluate the performance of the string matching algorithm which are the total number of character comparisons and the total number of attempts. This study aims to produce an efficient hybrid exact string matching algorithm called Sinan Sameer Tuned Boyer Moore-Quick Skip Search (SSTBMQS) algorithm by blending the best features that were extracted from the two selected original algorithms which are Tuned Boyer-Moore and Quick-Skip Search. The SSTBMQS hybrid algorithm was tested on different benchmark datasets with different size and different pattern lengths. The sequential version of the proposed hybrid algorithm produces better results when compared with its original algorithms (TBM and Quick-Skip Search) and when compared with Maximum-Shift hybrid algorithm which is considered as one of the most recent hybrid algorithm. The proposed hybrid algorithm has less number of attempts and less number of character comparisons.

Highlights

  • String matching, which involves locating all occurrences of a particular pattern in a large text, is considered one of the primary problems in computer science

  • The quick-skip search algorithm does not check the rightmost character in the text window as the first step before character comparison is implemented. The advantage of this algorithm is that it examines m − text characters to specify a starting search point as the first step; in the case of mismatch or entire pattern match, the shifting distance value depends on the Skip Search bucket and Quick Search bad character table

  • After the new position is computed, if the character positioned at the new position does not appear in the pattern characters, SSTBMQS algorithm continually shifts the pattern to the following potential beginning search point, and SSTBMQS algorithm goes into Step 2

Read more

Summary

INTRODUCTION

String matching, which involves locating all occurrences of a particular pattern in a large text, is considered one of the primary problems in computer science. String matching algorithms are the basic components of existing applications, such as text processing, intrusion detection, search analysis, information retrieval, and computational biology [6] All these applications involve a large amount of data because of the advancement in technology; all these applications involve different types of alphabets. The quick-skip search algorithm does not check the rightmost character in the text window as the first step before character comparison is implemented The advantage of this algorithm is that it examines m − text characters to specify a starting search point as the first step; in the case of mismatch or entire pattern match, the shifting distance value depends on the Skip Search bucket and Quick Search bad character table.

PREVIOUS WORKS
THE PROPOSED ALGORITHM
Searching Phase
SSTBMQS ALGORITHM TRACING EXAMPLE
Experimental Databases
Performance and Evaluation
Sequential Program Execution
Analyzing Number of Character Comparisons
Analyzing Number of Attempts
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call