Abstract

Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13--51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.