Feature Selection, Clustering, and Prototype Placement for Turbulence Datasets

Matthew Barone,Stefan Domino,Jaideep Ray

doi:10.2514/1.j060919

Matthew Barone, Stefan Domino + Show 1 more

Open Access

https://doi.org/10.2514/1.j060919

Copy DOI

Export

Save

Cite

Journal: AIAA Journal	Publication Date: Nov 30, 2021
Citations: 4	License type: other-oa

Affiliation: Sandia National Laboratories

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper explores automated approaches for the analysis and categorization of turbulent flow data as a means of assessing the quality of a turbulence dataset used for constructing data-driven turbulence closures. Single-point statistics from several high-fidelity turbulent flow simulation datasets are differentiated into groups using a Gaussian mixture model clustering algorithm. Candidate features are proposed, and a feature selection algorithm is applied to the data in a sequential fashion, flow by flow, to identify a good feature set and an optimal number of clusters for each dataset. Clusters are first identified for plane channel flows, producing results that agree with existing theory and empirical observations. Further clusters are then identified in an incremental fashion for flow over a wavy-walled channel, flow over a bump in a channel, and flow past a square cylinder. Some clusters are closely identified with the anisotropy state of the turbulence, whereas others can be connected to physical phenomena, such as boundary-layer separation and free shear layers. Exemplar points from the clusters, or prototypes, are then identified using a prototype placement method. These exemplars effectively summarize the dataset using a greatly reduced collection of data points. The clusters and their prototypes are used to assess the quality of a training dataset constructed by simply pooling the four flows. We enumerate the dataset’s shortcomings and state the limits of generalizability of any data-driven closure trained on it.

Full Text