Abstract
We consider the data-driven discovery of governing equations from time-series data in the limit of high noise. The algorithms developed describe an extensive toolkit of methods for circumventing the deleterious effects of noise in the context of the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparse identification of nonlinear dynamics</i> (SINDy) framework. We offer two primary contributions, both focused on noisy data acquired from a system <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol x} = { \boldsymbol f} ({ \boldsymbol x})$ </tex-math></inline-formula> . First, we propose, for use in high-noise settings, an extensive toolkit of critically enabling extensions for the SINDy regression method, to progressively cull functionals from an over-complete library and yield a set of sparse equations that regress to the derivate <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol {x}}$ </tex-math></inline-formula> . This toolkit includes: (regression step) weight timepoints based on estimated noise, use ensembles to estimate coefficients, and regress using FFTs; (culling step) leverage linear dependence of functionals, and restore and protect culled functionals based on Figures of Merit (FoMs). In a novel Assessment step, we define FoMs that compare model predictions to the original time-series (i.e., <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${ \boldsymbol x}(t)$ </tex-math></inline-formula> rather than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol {x}}(t)$ </tex-math></inline-formula> ). These innovations can extract sparse governing equations and coefficients from high-noise time-series data (e.g., 300% added noise). For example, it discovers the correct sparse libraries in the Lorenz system, with median coefficient estimate errors equal to 1%−3% (for 50% noise), 6%−8% (for 100% noise), and 23%−25% (for 300% noise). The enabling modules in the toolkit are combined into a single method, but the individual modules can be tactically applied in other equation discovery methods (SINDy or not) to improve results on high-noise data. Second, we propose a technique, applicable to any model discovery method based on <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol x} = { \boldsymbol f} ({ \boldsymbol x})$ </tex-math></inline-formula> , to assess the accuracy of a discovered model in the context of non-unique solutions due to noisy data. Currently, this non-uniqueness can obscure a discovered model’s accuracy and thus a discovery method’s effectiveness. We describe a technique that uses linear dependencies among functionals to transform a discovered model into an equivalent form that is closest to the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">true</i> model, enabling more accurate assessment of a discovered model’s correctness.
Highlights
We apply an engineering lens to sparse identification of nonlinear dynamics (SINDy) to address the exigencies of noisy data, and describe a toolkit of novel, practically-based techniques, including: In the Regression step we weight timepoints based on estimated noise, use ensembles to estimate coefficients, and regress using the Fast Fourier Transforms (FFTs) of the derivatives and library functionals
We offer two main contributions, both applicable to discovery methods generally, not just to SINDy
We have presented a toolkit of methods to address noisy data, for data-driven discovery of governing equations
Summary
The derivation of governing equations for physical systems has dominated the physical and engineering sciences for centuries. It is the dominant paradigm for the modeling and characterization of physical processes, engendering rapid and diverse technological developments in every application area of the sciences. Since the mid 20th century, governing equations have become even more influential due to the rise of computers and scientific computing. The rapid evolution of sensor technologies and data-acquisition software/hardware, broadly defined, has opened new fields of exploration where governing equations are difficult to generate and/or produce. For instance, come to mind as application areas where first-principle derivations are difficult to achieve, yet data is becoming abundant and of
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.