Abstract

Large amounts of microarray experimental data are available in public repositories. Although a variety of tools have been developed to make use of these data, the number of tools that detect marker genes is limited. Identification of marker genes associated with a specific tissue/cell type is a fundamental challenge in genetic and genomic research. In addition to other genes, marker genes are of great importance for understanding the gene function, the molecular mechanisms underlying complex diseases, and may lead to the development of new drug targets. We have previously developed a Bioconductor R package (MGFM) for marker gene detection from microarray data. The tool is freely available from the Bioconductor web site (https://www.bioconductor.org/packages/release/bioc/html/MGFM.html), and it is also provided as an online application integrated into the CellFinder platform (http://cellfinder.org/analysis/marker). In this work, we applied our tool to a public microarray data set from the NCBI’s Gene Expression Omnibus public repository encompassing samples for 12 human tissues. We compared the set of predicted marker genes to a set of tissue-specific genes obtained from the Tissue-specific Gene Expression and Regulation (TiGER) database. Furthermore, we tested the performance of the tool using two normalization methods, RMA and YuGene. YuGene performed slightly better than RMA. Our tool identified 38,4 % or 37,9 % of the TiGER derived tissue-specific genes using YuGene or RMA, respectively.

Highlights

  • The amount of microarray expression data available in public repositories has increased tremendously

  • We have previously developed an R package (MGFM) [1] to predict marker genes associated with tissues or cell types using microarray data

  • For validation of the potential marker sets, only gold-standard marker genes that were found on the microarray were considered for the validation

Read more

Summary

INTRODUCTION

The amount of microarray expression data available in public repositories has increased tremendously. Identification of marker genes associated with a specific tissue/cell type is a fundamental challenge in genetic and genomic research. We have previously developed an R package (MGFM) [1] to predict marker genes associated with tissues or cell types using microarray data. We present a benchmark of our tool using a microarray data set from the GEO database [2]. To validate the set of predicted marker genes, tissue-specific genes (gold-standard marker genes) for the examined 12 human tissues were collected from the Tissue-specific Gene Expression and Regulation (TiGER) [3] database. TiGER is a database for generating comprehensive information about human tissue-specific gene regulation, including both expression and regulatory data. The database contains tissue-specific gene expression profiles or expressed sequence tag (EST) data, cis-regulatory module (CRM) data, and combinatorial gene regulation data

METHODS
RESULTS
DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.