A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics

Leonid Hanin,Lyudmila Pavlova

doi:10.3390/math11204254

Abstract

We study classification of random sequences of characters selected from a given alphabet into two classes characterized by distinct character selection probabilities and length distributions. The classification is based on the sign of the log-likelihood score (LLS) consisting of a random sum and a random term depending on the length distributions for the two classes. For long sequences selected from a large alphabet, computing misclassification error rates is not feasible either theoretically or computationally. To mitigate this problem, we computed limiting distributions for two versions of the normalized LLS applicable to long sequences whose class-specific length follows a translated negative binomial distribution (TNBD). The two limiting distributions turned out to be plain or transformed Erlang distributions. This allowed us to establish the asymptotic accuracy of the likelihood-based classification of random sequences with TNBD length distributions. Our limit theorem generalizes a classic theorem on geometric random sums due to Rényi and is closely related to the published results of V. Korolev and coworkers on negative binomial random sums. As an illustration, we applied our limit theorem to the classification of DNA sequences contained in the genome of the bacterium Bacillus subtilis into two classes: protein-coding genes and standard noncoding open reading frames. We found that TNBDs provide an excellent fit to the length distributions for both classes and that the limiting distributions capture essential features of the normalized empirical LLS fairly well.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Oct 11, 2023
License type: CC BY 4.0

Similar Papers

On the Order of Approximation in Limit Theorems for Negative–Binomial Sums of Strictly Stationary m-Dependent Random Variables
Tran Loc Hung ... Phan Tri Kien
Acta Mathematica Vietnamica | VOL. 46
Tran Loc Hung, et. al.Tran Loc Hung ... Phan Tri Kien
25 Jan 2021
Acta Mathematica Vietnamica | VOL. 46

Bounds for the Rate of Convergence in the Generalized Rényi Theorem
Victor Korolev
Mathematics | VOL. 10
Victor KorolevVictor Korolev
14 Nov 2022
Mathematics | VOL. 10

Convergence Rate Estimates in the Global CLT for Compound Mixed Poisson Distributions
I G Shevtsova
Theory of Probability & Its Applications | VOL. 63
I G ShevtsovaI G Shevtsova
01 Jan 2018
Theory of Probability & Its Applications | VOL. 63

Limit Theorems for Randomly Selected Partial Sums
Patrick Billingsley
The Annals of Mathematical Statistics | VOL. 33
Patrick BillingsleyPatrick Billingsley
01 Mar 1962
The Annals of Mathematical Statistics | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Rényi-Type Limit Theorem on Random Sums and the Accuracy of Likelihood-Based Classification of Random Sequences with Application to Genomics

Abstract

Talk to us

Similar Papers

More From: Mathematics