An augmented semantic search tool for multilingual news analytics

Sandhya Harikumar,Gnana Venkata Naga Sai Kalyan Karumudi,Rohit Sathyajit

doi:10.3233/jifs-221184

Sandhya Harikumar, Gnana Venkata Naga Sai Kalyan Karumudi + Show 1 more

https://doi.org/10.3233/jifs-221184

Copy DOI

Abstract

News feeds generate colossal amount of data consisting of important information hidden in the intricacies. State of the art methods are still at infancy in providing a very generic and publicly available solution to skim through the important information in the news from various sources and an ability to search using specific keywords in different languages. This paper focuses on designing a tool to extract semantic details from news articles published through various internet sources in various languages. The semantic information is stored within DBMS for ease of organizing and retrieving the data. Further, a querying facility to search through entire articles based on the keyword or date-based search is also proposed to view the crisp content. The news articles in English, and two Indian languages - Hindi and Malayalam are considered for experimentation. The proposed strategy consists of two main components namely, Generative model creation and Query engine. Generative model aims to extract important entities and keywords along with their relevance to the article and other similar articles using Latent Dirichlet Allocation(LDA) and Named Entity Recognition(NER). Query engine is to facilitate on the fly retrieval of semantic content from the database, based on user keyword. The search engine, along with database indexing, reduces the access time to the database thereby retrieving the information in less time. Experimental results show that the proposed method is effective in terms of quality of information and time consumed for information retrieval.

Full Text