The subject matter of the article is models, methods and information technologies of monitoring data aggregation. The goal of the article is to determine the best deep learning model for reducing the dimensionality of dynamic systems monitoring data. The following tasks were solved: analysis of existing dimensionality reduction approaches, description of the general architecture of vanilla and variational autoencoders, development of their architecture, development of software for training and testing of autoencoders, conducting research on the performance quality of autoencoders for the problem of dimensionality reduction. The following models and methods were used: data processing and preparation, data dimensionality reduction. The software was developed using the Python language. Scikit-learn, Pandas, PyTorch, NumPy, argparse and others were used as auxiliary libraries. Obtained results: the work presents a classification of models and methods for dimensionality reduction, general reviews of vanilla and variational autoencoders, which include a description of the models, their properties, loss functions and their application to the problem of dimensionality reduction. Custom autoencoder architectures were also created, including visual representations of the autoencoder architecture and descriptions of each component. The software for training and testing autoencoders was developed, the dynamic system monitoring data set, and the steps for pre-training the data set were described. The metric for evaluating the quality of models is also described; the configuration of autoencoders and their training are considered. Conclusions: The vanilla autoencoder recovers the data much better than the variational one. Looking at the fact that the architectures of the autoencoders are the same, except for the peculiarities of the autoencoders, it can be noted that a vanilla autoencoder compresses data better by keeping more useful variables for later recovery from the bottleneck. Additionally, by training on different bottleneck sizes, you can determine the size at which the data is recovered best, which means that the most important variables are preserved. Looking at the results in general, the autoencoders work effectively for the dimensionality reduction task and the data recovery quality metric shows that they recover the data well with an error of 3–4 digits after 0. In conclusion, the vanilla autoencoder is the best deep learning model for aggregating monitoring data of dynamic systems.
Read full abstract