Abstract
The evenness score (E) in next-generation sequencing (NGS) quantifies the homogeneity in coverage of the NGS targets. Here I clarify the mathematical description of E, which is 1 minus the integral from 0 to 1 over the cumulative distribution function F(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1-x. An analogous formula for empirical coverage data is provided as well as fast R command line scripts. This new formula allows for a general comparison of E with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. For symmetrical distributions, including the Gaussian, E can be predicted closely as 1-σ(2)/2⩾E⩾1-σ/2 with σ⩽1 owing to normalization and symmetry. In case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields E≈exp(-σ*/2) with σ*(2)=ln(σ(2)+1) up to large σ (⩽3), and E≈1-F(exp(-1)) for very large σ (⩾2.5). In the latter kind of rather uneven coverage, E can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. Otherwise, E does not appear to have major advantages over σ or over a simple score exp(-σ) based on it. Actually, exp(-σ) exploits a much larger part of its range for the evaluation of realistic NGS outputs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.