AbstractThe dissemination of information worldwide is significantly facilitated by the news media, with many events having global relevance across various regions. However, certain news events receive limited coverage restricted to specific geographic areas, due to the barriers that hinder the spread of information. These barriers can be attributed to political, geographical, economic, cultural, or linguistic factors. In this research, we propose an approach for classifying these barriers by extracting semantic information from news articles using Wikipedia-concepts. Our methodology involves the collection of news articles, each annotated to indicate the specific barrier types, leveraging metadata from news publishers. Subsequently, we employ Wikipedia-concepts, in conjunction with the content of the news articles, as features to determine the barriers to news dissemination. Our approach is then compared with traditional text classification techniques, deep learning methods, and transformer-based models. We have performed experiments on news articles from ten categories of topics including health, sports, business, etc. The findings indicate that 1) Utilizing semantic knowledge yields distinct concepts across the ten categories, thereby enhancing the effectiveness and speed of the classification model. 2) The proposed approach, incorporating Wikipedia-concepts-based semantic knowledge, leads to improved performance in barrier classification when compared to using solely the body text of news articles. Specifically, there is an increase in the average F1-scores for four out of five barriers, with the economic barrier rising from 0.65 to 0.68, the linguistic barrier from 0.71 to 0.72, the political barrier from 0.68 to 0.70, and the geographical barrier from 0.63 to 0.68.
Read full abstract