Negative Factor

Xiaochun Yang,Bin Wang,Chen Li,Baihua Zheng,Tao Qiu,Yaoshu Wang

doi:10.1145/2847525

Abstract

The problem of finding matches of a regular expression (RE) on a string exists in many applications, such as text editing, biosequence search, and shell commands. Existing techniques first identify candidates using substrings in the RE, then verify each of them using an automaton. These techniques become inefficient when there are many candidate occurrences that need to be verified. In this article, we propose a novel technique that prunes false negatives by utilizing negative factors , which are substrings that cannot appear in an answer. A main advantage of the technique is that it can be integrated with many existing algorithms to improve their efficiency significantly. We present a detailed description of this technique. We develop an efficient algorithm that utilizes negative factors to prune candidates, then improve it by using bit operations to process negative factors in parallel. We show that negative factors, when used with necessary factors (substrings that must appear in each answer), can achieve much better pruning power. We analyze the large number of negative factors, and develop an algorithm for finding a small number of high-quality negative factors. We conducted a thorough experimental study of this technique on real datasets, including DNA sequences, proteins, and text documents, and show significant performance improvement of the state-of-the-art tools by an order of magnitude.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Database Systems	Publication Date: Jan 20, 2016
Citations: 15	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Negative Factor

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems

Lead the way for us

Similar Papers

Improving regular-expression matching on strings using negative factors
Xiaochun Yang ... Tao Qiu
-
Xiaochun Yang, et. al.Xiaochun Yang ... Tao Qiu
22 Jun 2013
22 Jun 2013

Efficient regular expression matching on LZ77 compressed strings using negative factors
Yutong Han ... Huaijie Zhu
World Wide Web | VOL. 22
Yutong Han, et. al.Yutong Han ... Huaijie Zhu
23 Mar 2019
World Wide Web | VOL. 22

Filtering Techniques for Regular Expression Matching in Strings
Tao Qiu ... Xiaochun Yang
-
Tao Qiu, et. al.Tao Qiu ... Xiaochun Yang
01 Jan 2018
01 Jan 2018

Don't Fear the Command Line!
Olga G Troyanskaya
Cell | VOL. 144
Olga G TroyanskayaOlga G Troyanskaya
01 Mar 2011
Cell | VOL. 144

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Negative Factor

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Database Systems