Abstract

Political news reports are populated all over the world in various languages. It has a great value to automatically detect the geolocation from these reports for a better understanding of the associated events. Although various open-source and commercial tools exist to identify geolocation, they fail to identify at a granular level such as locality or city and they do not support most languages. Most of the techniques view the problem in terms of Named Entity Recognition (NER) and identify geolocation information at the country level for a given text. In this paper, we consider English, Spanish and Arabic news articles from different publishers. We define primary focus location as the actual location where the event occurred amongst other focus locations mentioned in the report. Our aim is to extract the primary focus location regardless of the language from articles belonging to different news agencies. We propose a mechanism to identify potential sentences containing focus locations using NER. After that, we perform sentence embedding over words from different languages and then employ a supervised classification mechanism to predict the primary focus location. We also perform bias correction over the training data using a suitable adaptation mechanism to reduce the sampling bias in training data. Our method trains a classifier using bias-corrected training data from news articles published by an agency in one language, while testing the model on news articles published by another agency in a different language. Our empirical results when compared to baseline approaches show superior performance on real-world English, Spanish and Arabic news articles.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.