Abstract

Notwithstanding the huge amount of detailed information available in protein databases, it is not possible to automatically download a list of proteins ordered by the position of their codifying gene. This order becomes crucial when analyzing common features of proteins produced by loci or other specific regions of human chromosomes. In this study, we developed a new procedure that interrogates two human databases (genomic and protein) and produces a novel dataset of ordered proteins following the mapping of the corresponding genes. We validated and implemented the procedure to create a user-friendly web application. This novel data mining was used to evaluate the distribution of critical amino acid content in proteins codified by a human chromosome. For this purpose, we designed a new methodological approach called chromosome walking, which scanned the whole chromosome and found the regions producing proteins enriched in a selected amino acid. As an example of biomedical application, we investigated the human chromosome 15, which contains the locus DYX1 linked to developmental dyslexia, and we found three additional putative gene clusters whose expression could be driven by the environmental availability of glutamate. The novel data mining procedure and analysis could be exploited in the study of several human pathologies.

Highlights

  • Genomic databases are widely used to extract information about genes, but they provide the gene position on chromosomes, which is useful when entire regions of chromosomes are analyzed

  • Because the Canonical table provides information about the gene/protein position in each chromosome as well as AA composition, we can investigate the content of single amino acids of proteins translated along the whole chromosome, and we devised a new procedure defining the distribution of a chosen amino acid, called walking procedure

  • We demonstrated that the amino acid composition of proteins reflects the availability of amino acid imposed by the cellular context [8] and, we showed that when the tissue has a local utilization of AA, the ratio Glu/Gln is proportional to oxygenation

Read more

Summary

Introduction

Genomic databases are widely used to extract information about genes, but they provide the gene position on chromosomes, which is useful when entire regions of chromosomes are analyzed. Other examples of analysis that consider large regions of chromosomes are the genome-wide association studies (GWAS) and the linkage disequilibrium studies, which have identified loci related to disorders or pathologies [4,5,6,7]. The most used protein database, UniProt Knowledgebase (Uniprot), gives information on human proteins without ordering based on gene position. This lack means that without the ordered list of proteins codified by gene clusters, loci, or other specific regions, it is difficult (and impossible on a large scale) to analyze common features of the products of genes, such as amino acid content. We have recently demonstrated that the proteins codified by two gene clusters show a significantly lower ratio glutamate/glutamine compared with the nearby regions of the same chromosome [8]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call