Abstract

High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled “unknown”, “uncharacterized” or “hypothetical”, limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these “hypothetical” proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis “hypothetical” proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.

Highlights

  • Despite ever-increasing amounts of biological data, including primary data, such as genomic sequences, and functional genomic data from high-throughput experiments, there is a deficiency in functional annotation for many newly sequenced proteins

  • To investigate the role of Mycobacterium tuberculosis (MTB) hypothetical proteins in the molecular biology of the system, we statistically evaluate the topological values of these proteins compared to the network topological values and to those of other proteins in the MTB protein-protein functional network

  • We evaluated the quality of the annotation predictions and analyzed the biological relevance of the hypothetical proteins using their predicted functional classes and Gene Ontology (GO) biological process terms

Read more

Summary

Introduction

Despite ever-increasing amounts of biological data, including primary data, such as genomic sequences, and functional genomic data from high-throughput experiments, there is a deficiency in functional annotation for many newly sequenced proteins. About half of the Mycobacterium tuberculosis genome is made up of proteins of unknown functions. One of the major tasks in the post-genomic era is genome annotation, assigning functions to gene products based mostly on amino acid sequence, in order to capitalize on the knowledge gained through these sequencing efforts [2]. To this end, controlled vocabulary and well-defined protein function relationship schemes arose to represent annotations of known genes and proteins, and to predict functional annotations of those which are identified but so far uncharacterized. The terms used for describing a function should have definitions and be placed within a structure of relationships in an ontology [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call