Abstract

This paper conducts an in-depth analysis and research on the automatic selection and parameter configuration of the core components of Big Data software by using the retention model and the automatic selection of Big Data components by establishing a standardized requirement index and using the decision tree model to solve the problem of component selection in Big Data application development. By establishing standardized demand indicators and based on the retention model, a data transmission intermediate platform for bidirectional data detection is proposed based on the three demands of user input: storage, computation, and analysis, as well as the problem of undetectable packet loss in data transmission of existing IoT and Web service platforms. The data communication module of the data transmission intermediate platform enables mutual monitoring and detection of data interaction between IoT smart terminals and cloud platforms. The retention mode is built separately to realize the automatic selection of Big Data components. In this paper, we start from several mainstream distributed storage systems and use Cassandra as an example for experiments and tests. We use the multiple regression fitting method to build a corresponding performance model for hardware parameters, take user requirements as input, and use the performance model to configure system hardware parameters; by studying its system principle, architecture, features, and application scenarios, we build a software parameter configuration knowledge base to guide the software. This solves the difficult problem of selecting, deploying, and configuring parameters for Big Data applications.

Highlights

  • Big Data technology is no longer unfamiliar to us, and applications of Big Data technology are everywhere

  • Analysis of Automatic Selection and Parameter Configuration Results. e experiment tests the maximum performance that can be achieved by continuously increasing the number of client threads. e test uses 10 columns per row, with an average of 10 characters per column

  • For storage and compute components, the selection of storage and computer systems should be output at the same time; if the user selects analysis requirements, the selection of storage, compute, and analysis systems should be output at the same time. e algorithm gives a pseudocode for the component selection process

Read more

Summary

Introduction

Big Data technology is no longer unfamiliar to us, and applications of Big Data technology are everywhere. How to use the value of Big Data more effectively to serve us has become the direction of a malefactor for many people [1]. If we want to take advantage of the value of Big Data, we must process the data. Common Big Data task processing processes include data decompression, data cleansing, data loading, data conversion, and data backup [2]. Scheduling systems take advantage of these interdependencies to schedule these tasks, automatically scheduling each job according to their interdependencies to reduce manual operations [3]. A Big Data application scheduling system can realize the scheduling of these simple tasks and the scheduling management of complex Big Data tasks [4]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.