Abstract

Industrial big data analysis and mining are extremely complicated since it has complex correlations and heterogeneous structure from multiple data source. The growing industrial big data makes data analysis and mining extremely complicated. However, the traditional analysis approach based on relational databases or data warehouses are not flexible enough to deal with multi-source heterogeneous data and are less efficient to do search and analysis operation. Based on Spark and Elasticsearch, this paper presents a multi-dimensional analysis method and system for industrial big data. An OLAP model architecture based on JSON document structure is proposed, which can use Key-Value structure to flexibly define diverse industrial data, and the multi-dimensional structure model is easy to query and analyze. The table structure in the dimension information is converted into a JSON-based document structure, and the dimension information contained in the fact table is stored by the nested document. Elasticsearch is used to store the document structure tree and build an inverted index, which can improve the efficiency of the data analysis query. The query and analysis operations are transformed into the traversal and query operations in the document content. The time efficiency of the multi-dimensional analysis system based on Elasticsearch is much better than the analysis efficiency based on Hive.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call