Abstract

Different use cases have acknowledged the importance of author identities and the non-triviality of determining them. Author disambiguation (AD) is a special case of entity resolution resolving author mentions to actual real-world authors. Like in other entity resolution tasks, AD methods are strongly restricted by scale and person name conventions. So far, this has been addressed by static blocking methods which cannot adapt to such collection-dependent properties. We address this gap by presenting the first progressive method of author disambiguation. Progressive entity resolution tackles large-scale conflation problems by repeatedly increasing the number of pairs compared for potential equivalence. Our method uses lattice structures to model name inclusion in an adaptive and more efficient way than traditional blocking techniques based on alphabetical order or fixed-level generalization. Our work offers additional insights into the relationship between name-matching, different blocking schemes, blocking and clustering as well as cost and benefit. Using the Web of Science as large-scale annotated test data, we observe and compare our model’s performance over time and compare it with various configurations and baselines. Our approach consistently outperforms state-of-the-art blocking methods, underlining its contribution to the field of author disambiguation. Our approach offers a novel alternative for tackling ambiguity in entity resolution, which is a major challenge for many information systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.