Abstract

AbstractNowadays, Big Data research is making significant progress. The paper is devoted to optimizing the process of Big Data pre-processing. The existing shortcomings of input datasets that lead to a decrease in their quality in systems that Big Data processing have been identified. The main methods of pre-processing of data sets are considered. The ways to Big Data clearing are described using of which allows to correct distorted data. The existing approaches ways to designing the architecture of Big Data processing systems are analyzed and microservice architecture was used for their flexible processing. The possibilities of Big Data pre-processing have been expanded due to the improved method of data clearing based on the text data processing templates. The proposed advanced flexible complex of algorithms for Big Data pre-processing with a high level of fault tolerance allows increasing the accuracy of data further processing. Software realization (web-applications) of proposed algorithms complex for data cleansing methods with proposed improvements and microservice architecture was developed. The efficiency of the proposed architecture for the Big Data pre-processing system based on microservices is shown on practice.KeywordsBig DataPreprocessingData cleaningAlgorithmText data

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.