Abstract

Record linkage is an important application area of text pattern analysis. In this paper we propose a new sequence labeling method that can be used to extract entities from a string for record linkage. The proposed method combines a classifier and a Hidden Markov Model (HMM) to utilize both syntactical and textual information from the string. We first describe the model used in the proposed method and then discuss the parameter estimation for this model. The proposed method incorporates a classifier for handling textual information and integrates the classifier with the HMM statistically by estimating the error probability of the classifier. We applied the proposed method to the bibliographic sequence labeling problem, in which bibliographic components are extracted from reference strings. We compared the proposed method with other methods that use textual or syntactical information alone and showed that the proposed method outperforms them.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.