Compound Poisson Approximation Research Articles

The proposal and study of dependent Bayesian nonparametric models has been one of the most active research lines in the last two decades, with random vectors of measures representing a natural and popular tool to define them. Nonetheless, a principled approach to understand and quantify the associated dependence structure is still missing. We devise a general, and not model-specific, framework to achieve this task for random measure based models, which consists in: (a) quantify dependence of a random vector of probabilities in terms of closeness to exchangeability, which corresponds to the maximally dependent coupling with the same marginal distributions, that is, the comonotonic vector; (b) recast the problem in terms of the underlying random measures (in the same Fréchet class) and quantify the closeness to comonotonicity; (c) define a distance based on the Wasserstein metric, which is ideally suited for spaces of measures, to measure the dependence in a principled way. Several results, which represent the very first in the area, are obtained. In particular, useful bounds in terms of the underlying Lévy intensities are derived relying on compound Poisson approximations. These are then specialized to popular models in the Bayesian literature leading to interesting insights.

Read full abstract

BackgroundIdentification of motifs and quantification of their occurrences are important for the study of genetic diseases, gene evolution, transcription sites, and other biological mechanisms. Exact formulae for estimating count distributions of motifs under Markovian assumptions have high computational complexity and are impractical to be used on large motif sets. Approximated formulae, e.g. based on compound Poisson, are faster, but reliable p value calculation remains challenging. Here, we introduce ‘motif_prob’, a fast implementation of an exact formula for motif count distribution through progressive approximation with arbitrary precision. Our implementation speeds up the exact calculation, usually impractical, making it feasible and posit to substitute currently employed heuristics.ResultsWe implement motif_prob in both Perl and C+ + languages, using an efficient error-bound iterative process for the exact formula, providing comparison with state-of-the-art tools (e.g. MoSDi) in terms of precision, run time benchmarks, along with a real-world use case on bacterial motif characterization. Our software is able to process a million of motifs (13–31 bases) over genome lengths of 5 million bases within the minute on a regular laptop, and the run times for both the Perl and C+ + code are several orders of magnitude smaller (50–1000× faster) than MoSDi, even when using their fast compound Poisson approximation (60–120× faster). In the real-world use cases, we first show the consistency of motif_prob with MoSDi, and then how the p-value quantification is crucial for enrichment quantification when bacteria have different GC content, using motifs found in antimicrobial resistance genes. The software and the code sources are available under the MIT license at https://github.com/DataIntellSystLab/motif_prob.ConclusionsThe motif_prob software is a multi-platform and efficient open source solution for calculating exact frequency distributions of motifs. It can be integrated with motif discovery/characterization tools for quantifying enrichment and deviation from expected frequency ranges with exact p values, without loss in data processing efficiency.

Read full abstract

Compound Poisson Approximation Research Articles

Related Topics

Articles published on Compound Poisson Approximation

Total variation distance and compound poisson approximations for random sums

On Poisson Approximation

Compound Poisson Approximations to Sums of Extrema of Bernoulli Variables

Compound Poisson approximation

Malliavin calculus for marked binomial processes and applications

Spectral-free estimation of Lévy densities in high-frequency regime

Measuring dependence in the Wasserstein distance for Bayesian nonparametric models

Fast and exact quantification of motif occurrences in biological sequences

Average‐tempered stable subordinators with applications

Compound Poisson approximation for regularly varying fields with application to sequence alignment

Compound Poisson Approximations in $$\ell _p$$-norm for Sums of Weakly Dependent Vectors

DNA Motif Match Statistics Without Poisson Approximation.

Asymptotics for the sum of three state Markov dependent random variables

Compound Poisson approximation of subgraph counts in stochastic block models with multiple edges

COM-negative binomial distribution: modeling overdispersion and ultrahigh zero-inflated count data

Asymptotic results for the multiple scan statistic

On magic factors in Stein’s method for compound Poisson approximation

Refined total variation bounds in the multivariate and compound Poisson approximation

Phương pháp hàm đặc trưng cho một số định lí giới hạn trong xác suất

Approximation of Symmetric Three-State Markov Chain by Compound Poisson Law

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Compound Poisson Approximation Research Articles

Related Topics

Articles published on Compound Poisson Approximation

Total variation distance and compound poisson approximations for random sums

On Poisson Approximation

Compound Poisson Approximations to Sums of Extrema of Bernoulli Variables

Compound Poisson approximation

Malliavin calculus for marked binomial processes and applications

Spectral-free estimation of Lévy densities in high-frequency regime

Measuring dependence in the Wasserstein distance for Bayesian nonparametric models

Fast and exact quantification of motif occurrences in biological sequences

Average‐tempered stable subordinators with applications

Compound Poisson approximation for regularly varying fields with application to sequence alignment

Compound Poisson Approximations in $$\ell _p$$-norm for Sums of Weakly Dependent Vectors

DNA Motif Match Statistics Without Poisson Approximation.

Asymptotics for the sum of three state Markov dependent random variables

Compound Poisson approximation of subgraph counts in stochastic block models with multiple edges

COM-negative binomial distribution: modeling overdispersion and ultrahigh zero-inflated count data

Asymptotic results for the multiple scan statistic

On magic factors in Stein’s method for compound Poisson approximation

Refined total variation bounds in the multivariate and compound Poisson approximation

Phương pháp hàm đặc trưng cho một số định lí giới hạn trong xác suất

Approximation of Symmetric Three-State Markov Chain by Compound Poisson Law