Abstract

Context. Considered question correct interpretation information flow in distributed information systems. The object of study methods are promotion “big data” on cluster system.Objective. Is the study promising areas and technology for the analysis of structures data in distributed information systems.Method. The big data tendency prospects as well as timeliness of the problem are studied in this paper. The principles of work with them are addressed. Big data processing technologies are provided. The analysis of each one is performed. An example of “MapReduce” paradigm application, uploading of big volumes of data, processing and analyzing of unstructured information and its distribution into the clustered database is provided. The article summarizes the concept of “big data”. Examples of methods for working with arrays of unstructured data. Dedicated scientific guidance for analyzing big data. The principles of unstructured data in distributed information systems. Driven work platform “Hadoop MapReduce” and “Apache Spark”. Analyzed their properties and given the differences. An analysis of comparative performance against both platforms – the performance of the number of iterations. Consider ways to create RDD: parallelization transmitted collection program and a link to an external file system in “Hadoop”. There is an example rozparalelenoyi system RDD. Proposed work lone class for basic database operations: database connection, create a table, a table, get in line id, returning all elements of the database, update, delete and create the line.Results. The analysis Models Spark and Hadoop MapReduce for phased construction distributed information system. built up SparkConf object, containing information about applique and is the final version of the experiment.Conclusions. Conducted experiment confirmed efficiency the proposed method, are capable process horizontal data arrays, that parallelization by defective presentation of information. These promising areas of analyze structure data for the purpose of forecast results and create algorithms advanced correlation, contributing new understanding activity distributed information systems further research can consist in wide use information systems, that would provide a full range technological process adaptation information flows in clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.