English

Taoxin Peng ,Calum Mackay

doi:10.5220/0004892802170224

Abstract

Data quality is a key to success for all kinds of businesses that have information applications involved, such as data integration for data warehouses, text and web mining, information retrieval, search engine for web applications, etc. In such applications, matching strings is one of the popular tasks. There are a number of approximate string matching techniques available. However, there is still a problem that remains unanswered: for a given dataset, how to select an appropriate technique and a threshold value required by this technique for the purpose of string matching. To challenge this question, this paper analyses and evaluates a set of popular token-based string matching techniques on several carefully designed different datasets. A thorough experimental comparison confirms the statement that there is no clear overall best technique. However, some techniques do perform significantly better in some cases. Some suggestions have been presented, which can be used as guidance for researchers and practitioners to select an appropriate string matching technique and a corresponding threshold value for a given dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

English

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Solution for String Matching Problem of Indian Alphabetical Letters
Adnan I Al Rabea ... A.V Senthil Kumar
-
Adnan I Al Rabea, et. al.Adnan I Al Rabea ... A.V Senthil Kumar
01 Nov 2009
01 Nov 2009

Review of data, text and web mining software
Qingyu Zhang ... Richard S Segall
Kybernetes | VOL. 39
Qingyu Zhang, et. al.Qingyu Zhang ... Richard S Segall
04 May 2010
Kybernetes | VOL. 39

String Matching Technique Based on Hardware: A Comparative Analysis
Aakanksha Pandey ... Nilay Khare
-
Aakanksha Pandey, et. al.Aakanksha Pandey ... Nilay Khare
01 Jan 2012
01 Jan 2012

Proper nouns in English–Arabic cross language information retrieval
Abdelghani Bellaachia ... Ghita Amor‐Tijani
Journal of the American Society for Information Science and Technology | VOL. 59
Abdelghani Bellaachia, et. al.Abdelghani Bellaachia ... Ghita Amor‐Tijani
09 Jul 2008
Journal of the American Society for Information Science and Technology | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

English

Abstract

Talk to us

Similar Papers