Semantic Network Analysis Pipeline—Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text

Martin Cenek,Ashika Mulagada,Boyd Ching,Rowan Bulkow,Eric Pak,Levi Oyster

doi:10.3390/app9245302

Martin Cenek, Ashika Mulagada + Show 4 more

Open Access

PDF Available

https://doi.org/10.3390/app9245302

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Historical topic modeling and semantic concepts exploration in a large corpus of unstructured text remains a hard, opened problem. Despite advancements in natural languages processing tools, statistical linguistics models, graph theory and visualization, there is no framework that combines these piece-wise tools under one roof. We designed and constructed a Semantic Network Analysis Pipeline (SNAP) that is available as an open-source web-service that implements work-flow needed by a data scientist to explore historical semantic concepts in a text corpus. We define a graph theoretic notion of a semantic concept as a flow of closely related tokens through the corpus of text. The modular work-flow pipeline processes text using natural language processing tools, statistical content narrowing, creates semantic networks from lexical token chaining, performs social network analysis of token networks and creates a 3D visualization of the semantic concept flows through corpus for interactive concept exploration. Finally, we illustrate the framework’s utility to extract the information from a text corpus of Herman Melville’s novel Moby Dick, the transcript of the 2015–2016 United States (U.S.) Senate Hearings on Environment and Public Works, and the Australian Broadcast Corporation’s short news articles on rural and science topics.

Highlights

Historical semantic concepts (HSC) modeling aims to understand what the key concepts discussed in a text corpus are, how concepts evolve over time, and what the context semantic concepts are used in is in relation to each other as well as their relation to the supporting sub-concepts
The modular framework relies on mature linguistic tools that can be swapped to customize the mechanics of the computational linguistics processing
One such customization might include the implementation of a workflow to analyze the sentiment concept flows, where a sentiment concept flow would track and connect tokens coded with a sentiment label

Summary

Introduction

Historical semantic concepts (HSC) modeling aims to understand what the key concepts discussed in a text corpus are, how concepts evolve over time, and what the context semantic concepts are used in is in relation to each other as well as their relation to the supporting sub-concepts. Semantic networks can be used to capture the relationships among co-occurring words in a single document [1,2], interactive HSC exploration requires multi-step, computational linguistic work-flow to process the unstructured text to extract information from many documents in order to synthesize knowledge about the different concepts found in the corpus of text. Sci. 2019, 9, 5302 existing text analysis techniques often extract a set of discrete textual memes from a text corpus which does not preserve the meme’s context, relationship to other meme(s), nor how these relationships change throughout a corpus of text To illustrate these shortcomings, let us consider a toy example of three newspaper articles that were published sequentially on the topic of “salmon” and a set of key textual memes extracted from each article—environment, cost, salmon, economy, harvest, ecology, economy, investment, and global, economy, salmon, environment, cost. The framework’s project management allows for the inspection and validation of the intermediate text-processing steps, management of large data sets and provides data security

Background

Work-Flow

From Unstructured Text to Semantic Flows

Natural Language Processing

Term frequency and stop word removal

Semantic Concept

Semantic Flows

Implementation Notes

Sample Corpus Analysis

Moby Dick

Australian Broadcast Commission

Discussion and Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 5, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

Semantic Network Analysis Pipeline—Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Unsupervised Concept Hierarchy Learning: A Topic Modeling Guided Approach
V.S Anoop ... P Deepak
Procedia Computer Science | VOL. 89
V.S Anoop, et. al.V.S Anoop ... P Deepak
01 Jan 2015
Procedia Computer Science | VOL. 89

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Cultural framing of giftedness in recent US fictional texts.
Daniel Patrick Balestrini ... Heidrun Stoeger
PloS one | VOL. 19
Daniel Patrick Balestrini, et. al.Daniel Patrick Balestrini ... Heidrun Stoeger
01 Jan 2024
PloS one | VOL. 19

Abstractive Summarization on Dynamically Changing Text
Rahul Rawat ... Amaan Elahi
-
Rahul Rawat, et. al.Rahul Rawat ... Amaan Elahi
08 Apr 2021
08 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Semantic Network Analysis Pipeline—Interactive Text Mining Framework for Exploration of Semantic Flows in Large Corpus of Text

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences