Convolutional autoencoders have proven to be an adequate tool to perform reduced-order modeling for high-dimensional nonlinear dynamical systems. Their goal is to reduce dimensionality strongly while preserving the most characteristic features of the system. Here, we show that these models rely sensitively on the completeness of the provided data. This is particularly challenging for fully turbulent flows with their coherent structures ranging from large-scale superstructures to dissipative eddies over orders of magnitude in time and space. As a result, an unrealistically large number of data snapshots would be required to properly cover all the essential dynamics, whereas the features on small time and length scales require only a small number of snapshots of the respective flow, especially the long lasting large-scale structures that are difficult to characterize either numerically or experimentally. We demonstrate for three types of flows that a missing representation of large-scale turbulent structures leads to failures in the training process. We suggest a method to mitigate this shortcoming. This includes the transformation of data samples to new large-scale structures, which enhance the data. Furthermore, we skip augmentations that are more detrimental to the model performance. We evaluate our method for three datasets, two from numerical simulations of turbulent Rayleigh–Bénard convection flows and one from laboratory experiment for the flow past an array of cylinders. We show that the method can substantially improve model utility for high-dimensional data. In this way, we avoid an intensive grid search through possible augmentation combinations without further knowledge about the underlying system.
Read full abstract