The Comprehencive Approach to Big Data Preprocessing

Larysa Globa,Mariya Grebinichenko,Rina Novogrudska

doi:10.1007/978-3-031-16368-5_6

Abstract

AbstractNowadays, Big Data research is making significant progress. The paper is devoted to optimizing the process of Big Data pre-processing. The existing shortcomings of input datasets that lead to a decrease in their quality in systems that Big Data processing have been identified. The main methods of pre-processing of data sets are considered. The ways to Big Data clearing are described using of which allows to correct distorted data. The existing approaches ways to designing the architecture of Big Data processing systems are analyzed and microservice architecture was used for their flexible processing. The possibilities of Big Data pre-processing have been expanded due to the improved method of data clearing based on the text data processing templates. The proposed advanced flexible complex of algorithms for Big Data pre-processing with a high level of fault tolerance allows increasing the accuracy of data further processing. Software realization (web-applications) of proposed algorithms complex for data cleansing methods with proposed improvements and microservice architecture was developed. The efficiency of the proposed architecture for the Big Data pre-processing system based on microservices is shown on practice.KeywordsBig DataPreprocessingData cleaningAlgorithmText data

Full Text