Abstract

Big data analytics pipeline becomes popular for large volume data processing, Apache Zeppelin provides an integrated environment for data ingestion, data discovery, data analytics and data visualization and collaboration with an extended framework which allows different programming languages and data processing back ends to be plugged in. The supported languages include Scala, Python, SQL, and Shell script as well as big data processing back ends including Hadoop, Spark and Hive. With the necessary tool sets, an interactive and dynamic data analysis can be done on the fly with heterogeneous programming interfaces. Although Zeppelin is great for code development and interactive analysis with small scale data set for proof-of-concept or use-case presentations, running the data processing pipeline in the batch mode is still needed for performance, robustness to fit in an automated workflow in some cases. We are developing a tool to convert Zeppelin notebook into a workflow with a set of codes that can run in a batch mode through command line interface without requiring running Zeppelin, so that the prototype code can be seamlessly deployed on the production cluster after demo stage. The entire workflow can be preserved, configured manually and run automatically. Zeppelin also provides a flexible way to integrate the visualization functionality, another contribution of this paper is to extend the Zeppelin's existing built-in visualization component for D3Network. With two added features described above, Zeppelin can help users to develop big data pipeline and visualizing graph data quickly and efficiently.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.