Abstract

Currently, text documents from multiple sources have become available in many fields. It is of great interest to study the relationship between documents from different sources and uncover the underlying causality. Zhu et al. (2021) proposed a joint dynamic topic model (JDTM). They classified all topics into three groups and used the “shared topics” with a fixed time lag order to characterize the shared information between two corpora. Although JDTM is a powerful tool for discovering the lead-lag relationship, there are two potential shortcomings. First, different shared topics should have distinct meanings, which should lead to different time lag orders between the two corpora. Second, for dynamic documents, not all topics are represented in each time slice, and thus topic sparsity should be considered. To address these two problems, we propose a sparse joint dynamic topic model (SJDTM) with a flexible lead-lag order. We assume a birth-and-death mechanism for all topics and a flexible lead-lag order for different shared topics. The performance of SJDTM is evaluated using both synthetic data and two real text corpora consisting of conference papers and journal papers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.