Data lake management using topic modeling techniques

Mohamed Cherradi,Anass El Haddadi

doi:10.56294/dm2024282

Abstract

With the rapid rise of information technology, the amount of unstructured data from the data lake is rapidly growing and has become a great challenge in analyzing, organizing and automatically classifying in order to derive the meaningful information for a data-driven business. The scientific document has unlabeled text, so it's difficult to properly link it to a topic model. However, crafting a topic perception for a heterogeneous dataset within the domain of big data lakes presents a complex issue. The manual classification of text documents requires significant financial and human resources. Yet, employing topic modeling techniques could streamline this process, enhancing our understanding of word meanings and potentially reducing the resource burden. This paper presents a comparative study on metadata-based classification of scientific documents dataset, applying the two well-known machine learning-based topic modelling approaches, Latent Dirichlet Analysis (LDA) and Latent Semantic Allocation (LSA). To assess the effectiveness of our proposals, we conducted a thorough examination primarily centred on crucial assessment metrics, including coherence scores, perplexity, and log-likelihood. This evaluation was carried out on a scientific publications corpus, according to information from the title, abstract, keywords, authors, affiliation, and other metadata aspects. Results of these experiments highlight the superior performance of LDA over LSA, evidenced by a remarkable coherence value of (0,884) in contrast to LSA's (0,768)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data and Metadata	Publication Date: Jan 1, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Data lake management using topic modeling techniques

Abstract

Talk to us

Similar Papers

More From: Data and Metadata

Lead the way for us

Similar Papers

TEXT MINING: CLUSTERING USING BERT AND PROBABILISTIC TOPIC MODELING
Kavitha Datchanamoorthy ... Anandha Mala G S
Social Informatics Journal | VOL. 2
Kavitha Datchanamoorthy, et. al.Kavitha Datchanamoorthy ... Anandha Mala G S
31 Dec 2024
Social Informatics Journal | VOL. 2

Applying Topic Modeling to Railroad Grade Crossing Accident Report Text
Trefor Williams ... John Betak
-
Trefor Williams, et. al.Trefor Williams ... John Betak
23 Mar 2015
23 Mar 2015

Broadening the Research Pathways in Smart Agriculture: Predictive Analysis Using Semiautomatic Information Modeling
Komal Sharma ... Chetan Sharma
Journal of Sensors | VOL. 2022
Komal Sharma, et. al.Komal Sharma ... Chetan Sharma
06 Oct 2022
Journal of Sensors | VOL. 2022

Extracting Primary Emotions and Topics from the Al-Hayat Media Centre Magazine Publications, Using Topic Modelling and Lexicon-Based Approaches
Konstantinos E Maragkos ... Petros E Maravelakis
Social Science Computer Review | VOL. 41
Konstantinos E Maragkos, et. al.Konstantinos E Maragkos ... Petros E Maravelakis
13 May 2022
Social Science Computer Review | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data lake management using topic modeling techniques

Abstract

Talk to us

Similar Papers

More From: Data and Metadata