Abstract

BackgroundWe are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. However, this previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results.ResultsWe have developed a novel bioinformatics method based on Monte Carlo methods and Greenwood's spacing statistic for the computational analysis of the distribution of individual functional classes of genes (from GO). We used this to make the first comprehensive statistical analysis of the relationship between gene functional class and location on a genome. Analysis of the distribution of all genes except tandem duplicates on the five chromosomes of A. thaliana reveals that the distribution on chromosomes I, II, IV and V is clustered at P = 0.001. Many functional classes are clustered, with the degree of clustering within an individual class generally consistent across all five chromosomes. A novel and surprising result was that the locational distribution of some functional classes were significantly more evenly spaced than would be expected by chance.ConclusionAnalysis of the A. thaliana genome reveals evidence of unexplained order in the locational distribution of genes. The same general analysis method can be applied to any genome, and indeed any sequential data involving classes.

Highlights

  • We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance

  • Distributions of all genes The results for the distribution of all genes without tandem duplicates are briefly summarized in table 1, which shows that the genes on all five chromosomes of A. thaliana are significantly more clustered than would be expected from a locationally independent distribution

  • We can use the standard deviation as a measure of clustering, as explained later in the methods section, and we can use the standard error as a measure of the significance of the result

Read more

Summary

Introduction

We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. This previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results. Protein synthesis and the structure of the genome in eukaryotes is altogether very different from prokaryotes and the mechanisms affecting gene location in eukaryotes are likely to be very different

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.