Abstract

Publicly available human genomic sequence data provide an unprecedented opportunity for researchers to decode the functionality of human genome. Such information is extremely valuable in cancer prevention diagnosis and treatment. Cancer Genome Anatomy Project (CGAP) and Gene Expression Omnibus (GEO) are two bioinformatic infrastructures for studying functional genomics. The goal of this study is to explore the feasibility of incorporating the Internet-available bioinformatic databases to discover human breast cancer-related genes. Several tools including the Gene Finder, Virtual Northern (vNorthern) and SAGE digital gene expression displayer (DGED) were used to analyze differential gene expression between benign and malignant breast tissues. A pilot study was performed using both EST and SAGE vNorthern to analyze the expression of a panel of known genes, including high abundance genes beta-actin and G3PDH, low abundance genes BRCA1 and p53, tissue-specific genes CEA and PSA and two breast cancer-related genes Her2/neu and MUC1. We found a high expression of beta-actin and G3PDH and a low expression of BRCA1 and p53 across different types of tissues as well as a tissue-specific expression of CEA in colon and PSA in prostate. A further analysis of 30 known breast cancer-related genes in breast cancer tissues by vNorthern demonstrated a high expression of oncogenes and low expression of tumor suppressor genes. An open-end analysis of two pools of breast cancer and benign breast tissue libraries by SAGE DGED produced 53 differentially expressed genes according to the screening criteria of a >five-fold difference and p<0.01. Further analysis by EST vNorthern and virtual microarray analysis reduced the candidate genes to six, with four down-regulated genes, ANXA1, CAV1, KRT5 and MMP7, and two up-regulated genes, ERBB2 and G1P3 in breast cancer. These findings were validated by a real-time RT-PCR analysis in eight paired human breast cancer tissue samples. We conclude that the combined multiple high throughput analyses is an effective data mining strategy in cancer gene identification. This approach may improve the usage of public available genomic data through strategic data mining of high throughput analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.