Abstract

Each mutation in a population sample of DNA sequences can be classified by the number of sequences that inherit the mutant nucleotide, the resulting frequencies are known as mutations of different sizes or site frequency spectrum. Many summary statistics can be defined as a linear function of these frequencies. A flexible class of such linear summary statistics is explored analytically in this paper which include several well-known quantities, such as the number of segregating sizes and the mean number of nucleotide differences between two sequences. Some asymptotic variances and covariances are obtained while the analytical formulas for the variances and covariances of nine such linear summary statistics are derived, most of which are unknown to date. This study not only provides some theoretical foundations for exploring linear summary statistics, but also provides some newlinear summary statistics that may be utilized for analyzing sample polymorphism. Furthermore it is showed that a newly developed linear summary statistics has a smaller variance almost uniformly than Watterson’s estimator, and that a class of linear summary statistics given too heavy weights on mutations of smaller sizes result in asymptotically non-zero variance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.