The article modern technological approaches to the organization of big data collection and preparation, as well as the distribution and use of applications with the help of the Docker platform, loading and launching a set of applications on the basis of the container approach to virtualization, are considered. The basics of big data processing task formulation and technical problems of technical platform scaling and data quality assurance are briefly outlined. On the example of deploying and using PostgreSQL database management system with management tools and Apache Superset analytics platform, a modern approach to organizing data acquisition, transformation and subsequent demonstration in the form of a dashboard is presented. Work with data is shown on the example of a real open data set processing cycle with demonstration of all the main stages — loading, cleaning, preliminary analysis and visualization. In the context of the considered technologies and approaches to workflow organization in the form of a pipeline, examples of DevOps engineer and data engineer work tasks are given, the necessity of their work as part of a team is emphasized. The technical solutions demonstrated in the article can also be used in the study of the topics “Databases”, “Infographics”, and “Data analysis” in the advanced informatics course of technological profile as an illustration of the complexity of the relevant processes and in the training of future informatics teachers.
Read full abstract