Abstract

Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach. For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled) sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.

Highlights

  • With the sequences of complete genomes available [1,2,3,4], and accelerating technologies for high-throughput sequencing [5] genome wide sequence analyses of individual samples will soon become reality

  • Integration site analyses have gained increasing interest with the dramatic development of a retroviral vector-induced lymphoproliferative disease in 3 patients cured of X-linked severe combined immunodeficiency (X-SCID) that was triggered by insertional activation of the protooncogene LMO2 [16,17]

  • We considered 2, 3 or 4 insertions as common integration sites (CIS) of 2nd, of 3rd or 4th order if they fell within a 30 kb, 50 kb or 100 kb window of genomic sequence from each other, respectively

Read more

Summary

Introduction

With the sequences of complete genomes available [1,2,3,4], and accelerating technologies for high-throughput sequencing [5] genome wide sequence analyses of individual samples will soon become reality. Even if the null hypothesis of random uniform allocation is not adequate, as it is known from retroviral vector integration [31], our calculations can address segments of the genome located between sites of predilection for virus integration and can be extended to address non-uniform sequence motif distributions.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.