High-noise Regime Research Articles

The DNA storage channel is considered, in which the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula> Deoxyribonucleic acid (DNA) molecules comprising each codeword are stored without order, sampled <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> times with replacement, and then sequenced over a discrete memoryless channel. For a constant coverage depth <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$M/N$ </tex-math></inline-formula> and molecule length scaling <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\Theta (\log M)$ </tex-math></inline-formula> , lower (achievability) and upper (converse) bounds on the capacity of the channel, as well as a lower (achievability) bound on the reliability function of the channel are provided. Both the lower and upper bounds on the capacity generalize a bound which was previously known to hold only for the binary symmetric sequencing channel, and only under certain restrictions on the molecule length scaling and the crossover probability parameters. When specified to binary symmetric sequencing channel, these restrictions are completely removed for the lower bound and are significantly relaxed for the upper bound in the high-noise regime. The lower bound on the reliability function is achieved under a universal decoder, and reveals that the dominant error event is that of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">outage</i> – the event in which the capacity of the channel induced by the DNA molecule sampling operation does not support the target rate.

Read full abstract

We consider the data-driven discovery of governing equations from time-series data in the limit of high noise. The algorithms developed describe an extensive toolkit of methods for circumventing the deleterious effects of noise in the context of the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparse identification of nonlinear dynamics</i> (SINDy) framework. We offer two primary contributions, both focused on noisy data acquired from a system <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol x} = { \boldsymbol f} ({ \boldsymbol x})$ </tex-math></inline-formula> . First, we propose, for use in high-noise settings, an extensive toolkit of critically enabling extensions for the SINDy regression method, to progressively cull functionals from an over-complete library and yield a set of sparse equations that regress to the derivate <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol {x}}$ </tex-math></inline-formula> . This toolkit includes: (regression step) weight timepoints based on estimated noise, use ensembles to estimate coefficients, and regress using FFTs; (culling step) leverage linear dependence of functionals, and restore and protect culled functionals based on Figures of Merit (FoMs). In a novel Assessment step, we define FoMs that compare model predictions to the original time-series (i.e., <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${ \boldsymbol x}(t)$ </tex-math></inline-formula> rather than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol {x}}(t)$ </tex-math></inline-formula> ). These innovations can extract sparse governing equations and coefficients from high-noise time-series data (e.g., 300% added noise). For example, it discovers the correct sparse libraries in the Lorenz system, with median coefficient estimate errors equal to 1%−3% (for 50% noise), 6%−8% (for 100% noise), and 23%−25% (for 300% noise). The enabling modules in the toolkit are combined into a single method, but the individual modules can be tactically applied in other equation discovery methods (SINDy or not) to improve results on high-noise data. Second, we propose a technique, applicable to any model discovery method based on <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\dot { \boldsymbol x} = { \boldsymbol f} ({ \boldsymbol x})$ </tex-math></inline-formula> , to assess the accuracy of a discovered model in the context of non-unique solutions due to noisy data. Currently, this non-uniqueness can obscure a discovered model’s accuracy and thus a discovery method’s effectiveness. We describe a technique that uses linear dependencies among functionals to transform a discovered model into an equivalent form that is closest to the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">true</i> model, enabling more accurate assessment of a discovered model’s correctness.

Read full abstract

High-noise Regime Research Articles

Articles published on High-noise Regime

Denoising Drug Discovery Data for Improved Absorption, Distribution, Metabolism, Excretion, and Toxicity Property Prediction.

Purifying Photon Indistinguishability through Quantum Interference.

Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants

Linear inverse problems with nonnegativity constraints: Singularity of optimisers

Deep-blur: Blind identification and deblurring with convolutional neural networks.

Meta Derivative Identity for the Conditional Expectation

Derivative-based SINDy (DSINDy): Addressing the challenge of discovering governing equations from noisy data

The DNA Storage Channel: Capacity and Error Probability Bounds

Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem

Learning mixtures of permutations: Groups of pairwise comparisons and combinatorial method of moments

Dihedral Multi-Reference Alignment

An approximate expectation-maximization for two-dimensional multi-target detection.

Two-Dimensional Multi-Target Detection: An Autocorrelation Analysis Approach

A Toolkit for Data-Driven Discovery of Governing Equations in High-Noise Regimes

Robust Localization With Bounded Noise: Creating a Superset of the Possible Target Positions via Linear-Fractional Representations

Comparative Study of Sampling-Based Simulation Costs of Noisy Quantum Circuits

Hyperentanglement-enhanced quantum illumination

Device-independent quantum key distribution with random key basis

Synaptic plasticity as Bayesian inference.

The generalized orthogonal Procrustes problem in the high noise regime

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-noise Regime Research Articles

Articles published on High-noise Regime

Denoising Drug Discovery Data for Improved Absorption, Distribution, Metabolism, Excretion, and Toxicity Property Prediction.

Purifying Photon Indistinguishability through Quantum Interference.

Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants

Linear inverse problems with nonnegativity constraints: Singularity of optimisers

Deep-blur: Blind identification and deblurring with convolutional neural networks.

Meta Derivative Identity for the Conditional Expectation

Derivative-based SINDy (DSINDy): Addressing the challenge of discovering governing equations from noisy data

The DNA Storage Channel: Capacity and Error Probability Bounds

Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem

Learning mixtures of permutations: Groups of pairwise comparisons and combinatorial method of moments

Dihedral Multi-Reference Alignment

An approximate expectation-maximization for two-dimensional multi-target detection.

Two-Dimensional Multi-Target Detection: An Autocorrelation Analysis Approach

A Toolkit for Data-Driven Discovery of Governing Equations in High-Noise Regimes

Robust Localization With Bounded Noise: Creating a Superset of the Possible Target Positions via Linear-Fractional Representations

Comparative Study of Sampling-Based Simulation Costs of Noisy Quantum Circuits

Hyperentanglement-enhanced quantum illumination

Device-independent quantum key distribution with random key basis

Synaptic plasticity as Bayesian inference.

The generalized orthogonal Procrustes problem in the high noise regime