Abstract

Entity resolution (ER) is the operation of distinguishing records that return to the same real world entity. It is used to link records among datasets and to match query records in real-time with existing datasets. Indexing is a major step in the ER process that reduces the search space. Most existing indexing techniques that are utilized in the ER process are designed to work with English datasets. Such techniques may not be suitable for use with other languages, such as Arabic. In this paper, enhancement for indexing techniques that are designed to work with English datasets has been proposed to be used with Arabic language by applying transliteration on Arabic strings before performing the indexing step of the ER process. The proposed approach is experimented and compared with using word stems as blocking keys in the indexing step. The results show better matching accuracy for the use of transliteration over the use of words stems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call