Obey validity limits of data-driven models through topological data analysis and one-class classification

Artur M. Schweidtmann,Alexander Mitsos,Jana M. Weber,Linus Netze,Christian Wende

doi:10.1007/s11081-021-09608-0

Abstract

Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

Highlights

Supervised machine-learning techniques have been re-emerging as a promising avenue for data-driven modeling in various engineering disciplines (Venkatasubramanian 2019)
We develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications
Note that we refer to the validity domain of individual data-driven models throughout this work, but the concept can be applied to hybrid models (Kahrs and Marquardt 2007)

Summary

Introduction

Supervised machine-learning techniques have been re-emerging as a promising avenue for data-driven modeling in various engineering disciplines (Venkatasubramanian 2019). The vast majority of previous publications use box constraints (i.e., hyperrectangles) to bound the inputs of data-driven models, i.e., each variable has independent bounds This approach is practical when the training data is obtained from simulations based on regular grids or Latin hypercubes that are sufficiently dense. As proposed by Courrieu (1994), a few previous works in process systems engineering (PSE) constructed the convex hull of the training data points to describe the validity domain and integrated it as a set of linear constraints in optimization problems (Kahrs and Marquardt 2007; Zhang et al 2016; Asprion et al 2019). In case clusters or holes are identified, we train a one-class SVM on the training data domain of the data-driven models and encode it as constraints in the subsequent process optimization. We demonstrate the potential of our method on a set of illustrative mathematical case studies and an engineering case study, i.e., the open-loop control of a sulfur recovery unit

Methodology

Topological data analysis using persistent homology

Learn validity domain using one-class support vector machines

Optimization with classifier as constraint

Illustrative case studies

Topological data analysis

Validity domain modeling

Engineering application

Conclusion

Compliance with ethical standards

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Optimization and Engineering	Publication Date: May 12, 2021
Citations: 28	License type: open-access

R Discovery Prime

R Discovery Prime

Obey validity limits of data-driven models through topological data analysis and one-class classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Optimization and Engineering

Lead the way for us

Similar Papers

Fault diagnosis using data-driven residuals for anomaly classification with incomplete training data
Daniel Jung ... Arman Mohammadi
IFAC PapersOnLine | VOL. 56
Daniel Jung, et. al.Daniel Jung ... Arman Mohammadi
01 Jan 2023
IFAC PapersOnLine | VOL. 56

One-class remote sensing classification: one-class vs. binary classifiers
Xueqing Deng ... Shawn Newsam
International Journal of Remote Sensing | VOL. 39
Xueqing Deng, et. al.Xueqing Deng ... Shawn Newsam
03 Jan 2018
International Journal of Remote Sensing | VOL. 39

A Bat-Optimized One-Class Support Vector Machine for Mineral Prospectivity Mapping
Yongliang Chen ... Qingying Zhao
Minerals | VOL. 9
Yongliang Chen, et. al.Yongliang Chen ... Qingying Zhao
23 May 2019
Minerals | VOL. 9

Probabilistic Novelty Detection With Support Vector Machines
Lei Clifton ... Peter Watkinson
IEEE Transactions on Reliability | VOL. 63
Lei Clifton, et. al.Lei Clifton ... Peter Watkinson
01 Jun 2014
IEEE Transactions on Reliability | VOL. 63

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Obey validity limits of data-driven models through topological data analysis and one-class classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Optimization and Engineering