Abstract

AbstractRecent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing, expression profiles, proteomics, and electronic health records are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of distributions that commonly appear in the analysis of such data. These distributions present some interesting features: they are discontinuous in the rational numbers, but continuous in the irrational numbers, and possess a certain self-similar (fractal-like) structure. The first set of examples which we present here are drawn from a high-throughput sequencing experiment. Here, the self-similar distributions appear as part of the evaluation of the error rate of the sequencing technology and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in electronic clinical data. The distributions are also relevant to identification of subclonal populations in tumors and the study of the evolution of infectious diseases, and more precisely the study of quasi-species and intrahost diversity of viral populations.

Highlights

  • The large volumes of data obtained by recent technological developments, such as highthroughput sequencing and expression profiles, are providing novel and complementary ways to studying biological systems

  • It is believed that most tumors are due to somatic mutations that lead to an uncontrolled cell growth

  • The ratios of interest can be seen as sampled from a distribution over the rational numbers in the unit interval

Read more

Summary

Introduction

The large volumes of data obtained by recent technological developments, such as highthroughput sequencing and expression profiles, are providing novel and complementary ways to studying biological systems. We point out some interesting properties of the ratios of natural numbers obtained in a biological/clinical setting.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call