Abstract
2020 was atypical mainly due to the Covid-19 pandemic's beginning which has become a vastly discussed subject worldwide. Unsurprisingly, online news websites have followed this trend, besides publishing traditional subjects (e.g., sports, business, and politics). Understanding how the subjects interact with each other over the year is a challenge. In this paper, we intend to build a 2020 timeline based on the subjects and their similarity using a topic modeling approach (LDA) and a novel topic similarity metric. To accomplish that, we scrap news articles websites to build a collection of 2020 news. After that, the collection is pre-processed and sliced monthly. We use an LDA approach to discover the latent topics from all temporal collections. Next, we calculate the similarity between the topics across 2020 using five semantic correlations: born, death, keep, merge, and split. The discovered topics and the drift semantic between them show that building a meaningful 2020 time line is possible.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.