Abstract

BackgroundThe tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD) rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used ‘Gene Ontology semantic similarity score’ to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability.ResultsSubsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95) identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1–4). Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1), chemotherapy/drug resistance biomarkers (panel 2), hypoxia regulated biomarkers (panel 3) and lung extra cellular matrix biomarkers (panel 4).ConclusionsExpression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3), HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1/SAG, AIB1 and AZIN1) are significantly down regulated. All down regulated genes in this panel were highly up regulated in most other types of cancers. These panels of proteins may represent signature biomarkers for lung cancer and will aid in lung cancer diagnosis and disease monitoring as well as in the prediction of responses to therapeutics.

Highlights

  • The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods

  • In the DDD1, we employed the UniGene pool (A) representing 39 human normal tissues excluding normal lung tissue and UniGene pool (B) representing 11 counterpart lung normal tissues were employed for analysis (Table 1)

  • Comparison of DDD1 with DDD2 has revealed that in total 76 genes from DDD1 were differentially expressed in DDD2 (See Additional file 2)

Read more

Summary

Introduction

The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. We have used ‘Gene Ontology semantic similarity score’ to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. Gene expression analysis in the post genomic era through high throughput genomic studies led to identification of enormous candidate genes related to pathophysiological conditions or altered signal transduction One such freely available high throughput database is ‘Unigene’ (http://www.ncbi.nlm.nih.gov/Unigene/). The Unigene libraries of interest with varying treatment conditions can be digitally ‘pooled’ and compared to control vs treatment using Digital Differential Display (DDD) It enables the identification of numerical differences in transcript frequency between the individual or pooled Unigene libraries from the various treatment conditions and multiple cDNA libraries. The process of identifying the most likely tissue specific disease candidate genes from the pool of differentially expressed genes remained difficult [1]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call