Abstract

In this paper, we deal with the problem of rule-based entity resolution on imprecise temporal data. Entity resolution (ER) is widely explored in research community, but the problem on temporal data, especially without available timestamps, has not been studied well yet. Because of the elapsing of time, records referring to the same entity observed in different time periods may be different. Besides traditional similarity-based ER approaches, by carefully exploring several data quality rules, e.g., matching dependency and data currency, much information can be obtained to facilitate to cope with this problem. In this paper, we use such rules to derive temporal records’ information of time order and trend of their attributes’ evolvement with elapsing of time. Specifically, we first block records into smaller blocks, and then by exploring data currency constraints, we propose a temporal clustering approach with two steps, i.e., the skeleton clustering and the banding clustering. Experimental results on both real and synthetic data show that our entity resolution method can achieve both high accuracy and efficiency on datasets with hidden temporal information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call