Field Of Text Mining Research Articles

The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, “topic-word” distributions and “document-topic” distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, “document-topic” distributions of the original documents are obtained by merging new subdocuments’ “document-topic” distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics.

Read full abstract

In this era, the proliferating role of social media in our lives has popularized the posting of the short text. The short texts contain limited context with unique characteristics which makes them difficult to handle. Every day billions of short texts are produced in the form of tags, keywords, tweets, phone messages, messenger conversations social network posts, etc. The analysis of these short texts is imperative in the field of text mining and content analysis. The extraction of precise topics from large-scale short text documents is a critical and challenging task. The conventional approaches fail to obtain word co-occurrence patterns in topics due to the sparsity problem in short texts, such as text over the web, social media like Twitter, and news headlines. Therefore, in this paper, the sparsity problem is ameliorated by presenting a novel fuzzy topic modeling (FTM) approach for short text through fuzzy perspective. In this research, the local and global term frequencies are computed through a bag-of-words (BOW) model. To remove the negative impact of high dimensionality on the global term weighting, the principal component analysis is adopted; thereafter the fuzzy c-means algorithm is employed to retrieve the semantically relevant topics from the documents. The experiments are conducted over the three real-world short text datasets: the snippets dataset is in the category of small dataset whereas the other two datasets, Twitter and questions, are the bigger datasets. Experimental results show that the proposed approach discovered the topics more precisely and performed better as compared to other state-of-the-art baseline topic models such as GLTM, CSTM, LTM, LDA, Mix-gram, BTM, SATM, and DREx+LDA. The performance of FTM is also demonstrated in classification, clustering, topic coherence and execution time. FTM classification accuracy is 0.95, 0.94, 0.91, 0.89 and 0.87 on snippets dataset with 50, 75, 100, 125 and 200 number of topics. The classification accuracy of FTM on questions dataset is 0.73, 0.74, 0.70, 0.68 and 0.78 with 50, 75, 100, 125 and 200 number of topics. The classification accuracies of FTM on snippets and questions datasets are higher than state-of-the-art baseline topic models.

Read full abstract

Field Of Text Mining Research Articles

Related Topics

Articles published on Field Of Text Mining

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

DOMAIN SPECIFIC KEY FEATURE EXTRACTION USING KNOWLEDGE GRAPH MINING

BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery

Using openEHR Archetypes for Automated Extraction of Numerical Information from Clinical Narratives.

A text semantic topic discovery method based on the conditional co-occurrence degree

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Text mining in education

Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods

Searching Activity Trajectories with Semantics

Fuzzy topic modeling approach for text mining over short text

Design-by-Analogy: Exploring for Analogical Inspiration With Behavior, Material, and Component-Based Structural Representation of Patent Databases

Non-word Attributes’ Efficiency in Text Mining Authorship Prediction

Learning document representation via topic-enhanced LSTM model

Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine

Towards Identifying Author Confidence in Biomedical Articles

Lyric Text Mining Of Dangdut: Visualizing The Selected Words And Word Pairs Of The Legendary Rhoma Irama’s Dangdut Song In The 1970s Era

A New LSA and Entropy-Based Approach for Automatic Text Document Summarization

Using discussion logic in analyzing online group discussions: A text mining approach

String Matching Algorithms

Document Level Sentiment Analysis: A survey

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Field Of Text Mining Research Articles

Related Topics

Articles published on Field Of Text Mining

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

DOMAIN SPECIFIC KEY FEATURE EXTRACTION USING KNOWLEDGE GRAPH MINING

BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery

Using openEHR Archetypes for Automated Extraction of Numerical Information from Clinical Narratives.

A text semantic topic discovery method based on the conditional co-occurrence degree

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Text mining in education

Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods

Searching Activity Trajectories with Semantics

Fuzzy topic modeling approach for text mining over short text

Design-by-Analogy: Exploring for Analogical Inspiration With Behavior, Material, and Component-Based Structural Representation of Patent Databases

Non-word Attributes’ Efficiency in Text Mining Authorship Prediction

Learning document representation via topic-enhanced LSTM model

Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine

Towards Identifying Author Confidence in Biomedical Articles

Lyric Text Mining Of Dangdut: Visualizing The Selected Words And Word Pairs Of The Legendary Rhoma Irama’s Dangdut Song In The 1970s Era

A New LSA and Entropy-Based Approach for Automatic Text Document Summarization

Using discussion logic in analyzing online group discussions: A text mining approach

String Matching Algorithms

Document Level Sentiment Analysis: A survey