Abstract

The booming development of data science and big data technology stacks has inspired continuous iterative updates of data science research or working methods. At present, the granularity of the labor division between data science and big data is more refined. Traditional work methods, from work infrastructure environment construction to data modelling and analysis of working methods, will greatly delay work and research efficiency. In this paper, we focus on the purpose of the current friendly collaboration of the data science team to build data science and big data analysis application platform based on microservices architecture for education or nonprofessional research field. In the environment based on microservices that facilitates updating the components of each component, the platform has a personal code experiment environment that integrates JupyterHub based on Spark and HDFS for multiuser use and a visualized modelling tools which follow the modular design of data science engineering based on Greenplum in-database analysis. The entire web service system is developed based on spring boot.

Highlights

  • In recent years, data science and big data technology stacks have achieved explosive growth

  • We performed an experimental evaluation of Qunxian with all server-side platform components being deployed on Google Cloud Engine. e boot disk is based on CentOS 7 OS image with the standard persistent disk of 2TB

  • We have presented Qunxian, a new microservice-based big data analysis platform, which is deployed on Google Cloud Engine running across distributed computing resources

Read more

Summary

Introduction

Data science and big data technology stacks have achieved explosive growth. Traditional working methods, from setting up an experimental environment to data acquisition, data processing, modelling training, and data prediction, are often performed in a unified manner [3]. People need a multiuser based infrastructure environment platform [6]. For workers with big data needs, they need to integrate the experimental environment based on Hadoop [10], Spark, and JupyterHub. In data sharing and data stream processing, the security protection of data is often the most troublesome problem in the analysis of actual business scenarios. Qunxian Platform is a big data analysis platform based on microservices It helps to realize the sharing of data, computing power, and infrastructure resource. It enables users to use a more friendly environment to record and share the experimental process and the visualized model.

Related Works
Hardware Layer
Two Web-Based Applications
Experiments
Database Service
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call