Abstract

BackgroundA large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition.ResultsWe compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues (like Ala, Gly, Arg and Val) have significant positive correlation (r > 0.20) and some other residues (Like Asp, Leu, Asn and Ser) have negative correlation (r < -0.15) with the expression of genes. A significant negative correlation (r = -0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine (SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins.ConclusionThere is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level . This server will allow users to study the evolution from expression data.

Highlights

  • A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network

  • We developed a method based on Support Vector Machine (SVM) for recognition of genes belonging to cytoplasmic ribosomes (One of the class used by Brown et al, 2000) using i) gene expression data (79 features); ii) amino acid composition of proteins (20 features) and iii) combination of two

  • Length correlation We examined the correlation between the length of gene and its expression level

Read more

Summary

Introduction

A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. Jansen et al 2003 [11] studied the two commonly used numerical indices to measure the expression of genes; i) 'codon adaptation index' (CAI) and ii) 'codon usage' (CU). They improve the performance of two indices using genome wide yeast expression data (15) and achieve correlation r = 0.63 to 0.70 and r = 0.63 to 0.71 of CAI and CU with gene expression level respectively. These studies indicate that it is possible to predict the expression of genes with reasonable accuracy from its nucleotide sequence. The question arises if there is correlation than can we use this knowledge to predict the expression level of genes from amino acid sequence of their protein like nucleotide sequence

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call