Abstract

The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.

Highlights

  • Topic modeling provides a suite of algorithms to discover hidden thematic structure in large collections of texts

  • This paper considers about the research of web content like Twitter data mostly remain in the analysis of the relationship between the user and community structure, lack of early warning of user behavior by using text content analysis. [3, 4] Traditional data mining algorithm more suitable for traditional corpus, and those measures without consider the special network structure data, no suitable for build data model in specific filed

  • In LDA, each document may be viewed as a mixture of various topics. This is similar to probabilistic latent semantic analysis, except that in LDA the topic distribution is assumed to have a Dirichlet prior

Read more

Summary

Introduction

Topic modeling provides a suite of algorithms to discover hidden thematic structure in large collections of texts. A topic model takes a collection of texts as input It discovers a set of “topics” — recurring themes that are discussed in the collection — and the degree to which each document exhibits those topics. [3, 4] Traditional data mining algorithm more suitable for traditional corpus, and those measures without consider the special network structure data, no suitable for build data model in specific filed. I will describe LDA (latent Dirichlet allocation), the simplest topic model [5, 6], will explain what a “topic” is from the mathematical perspective and why algorithms can discover topics from collections of environmental data content text.

Transitional Topic Mining Algorithms
Topic Mining Algorithm Based on Linear Algebra
Text Generation Model LDA
Environmental Data on Social Networks Generation Model ED-LDA
Topic Mining and Model Derivation
Experimental Preparation
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call