Abstract

Person Name Disambiguation on the Web is the problem of grouping web pages retrieved by a search engine when looking for a person name according to the individual they refer to. This problem has been addressed in a monolingual scenario where all the search results are written in the same language. However, search engines can also return links to web pages written in different languages. We study how to address multilingualism for this problem using the MC4WePS data set, a recent gold standard that includes real search results written in different languages. For this purpose, we first analyze the suitability of using a translation tool to treat multilingualism with two state-of-the-art clustering algorithms. Since the use of this kind of tools increases the processing time of the disambiguation process, we propose an approach to deal with multilingualism that generalizes the monolingual scenario and does not require any translation resources. Our approach obtains better results than the translation approaches with the gold standard, making it a competitive choice in a real scenario.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call