Study on Building and Applying Technology of Multi-level Data Warehouse

Lihua Song,Yebai Li,Ying Zhan

doi:10.2991/ameii-15.2015.110

Abstract

Based on multi-database management systems application, we use multi-level data warehouses. This paper proposes an incremental method to update the data set of result and summarizes the two transmission strategies. Verify that the data-driven policy through experimental simulation platform environment analysis and draw the feasibility and effectiveness of this method is more suitable platform applications conclusion. Introduction In the background of fast-growing information, data shows the characteristics of massive, distributed and heterogeneous. It makes the centralized data warehouse processing capacity in data analysis is increasingly limited. Because of distributed data warehouses have the features of low cost of maintenance, data integrity, high tolerance against system failures[1], it's more competitive for some special cases which include bank and e-commerce platform. In recent years, Hive is the mainly technology to be used to build distributed data warehouses[2], but Hadoop is not yet available analysis tools. In order to avoid complex development work in the application presentation layer, we use the database technology and combine with the open source analysis and presentation tools(e.g. Mondrian and JPivot). Related research on multilevel data warehouses structure for typical distributed data warehouses form(including global warehouse and local data warehouses) is as follows: Paper [3] presented double channels view updating algorithm to maintain multiple views on line and parallel realize OLAP query for ensuring data consistency and improving query efficiency. Paper [4] proposed a model of distributed data warehouses for sale decision and it also present a solution for data transmission from local data warehouses to global data warehouse, which was based on large-scale clothing enterprises. In this paper, global data warehouse and local data warehouse are referred to as platform data warehouse(PDW) and enterprise data warehouse(EDW). PDW is built for e-commerce platform, and EDWs are built for the enterprise users who registered on the platform There are two strategies for transmission between the PDW and EDWs. a). Round-robin scheduling, PDW extracts data from the EDWs; b). Data-driven, after EDWs completing update they transmit data to PDW immediately. Both of them are related to data exchange across database servers. The strategy a) is characterized by PDW based on specific conditions to determine the priority of the EDWs. Paper [5] presented using round-robin scheduling strategy in distributed data warehouses to solve the problem which is in the poor flexibility and real-time, but the global data warehouse had to maintain the additional views for update and the communication frequency was increased. The research also includes the relevant papers [6], [7]. Strategy b) is featured in EDWs extracting data in parallel then push them to PDW, and it has strong continuity, low network communication frequency and high concurrency. The adverse factor is that the strategy is very likely to cause conflict between the update transactions, and paper [8] proposed solution to the conflict from a data storage perspective.

Full Text