Abstract

Currently, the traditional architecture of data storage and analysis has become not suitable enough. With rapid flow of information, there is no doubt that big data technology brings significant benefits such as efficiency and productivity. However, a successful approach to big data migration requires efficient architecture. In this paper, we proposed an architecture to import existing power data storage system of our campus into big data platform with Data Lake. We use Apache sqoop to transfer historical data to Apache Hive for data storage. Kafka is used for making sure the integrity of streaming data and as the input source for Spark streaming that writing data to HBase. To integrate the data we use the concept of data lake which based on Hive and HBase. Impala and Apache Phoenix are individually used as search engines for Hive and HBase. Apache Spark can quickly analyze and compute the data from Data Lake, and we choose Apache Superset as the solution for visualization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.