Abstract

Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on “qualitatively tissue-specific expressed genes” which are highly enriched in one or a group of tissues but paid less attention to “quantitatively tissue-specific expressed genes”, which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying “quantitatively tissue-specific expressed genes” capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient (MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.

Highlights

  • A biological tissue is an ensemble of similar cells residing in the same location and performing specific biological functions in multicellular organisms

  • To avoid the unbalanced data sizes among the different tissues, we collected the transcriptome data from 8436 samples originating from 25 tissues in the Genotype-Tissue Expression (GTEx) project [9]; tissues with sample sizes smaller than 80 were excluded from this study

  • 10-fold cross-validation (10-CV) was adopted to evaluate the performance of each classification model; the predicted results included the prediction accuracies of the 25 tissues, the overall accuracy (TACC) and Matthews Correlation Coefficient (MCC) (Supplementary Material S2)

Read more

Summary

Introduction

A biological tissue is an ensemble of similar cells residing in the same location and performing specific biological functions in multicellular organisms. As the bridge between single cells and functional organs, tissues are elementary units with both phenotypical and functional contributions to biological identity [1]. All biological functions are regulated and manipulated directly or indirectly by proteins, which can be further attributed to gene expression patterns measured by messenger. Genes 2018, 9, 449 expression patterns and a full picture of how genes are expressed in different tissues will help to unveil the molecular mechanisms involved in tissue development and function. Two milestones for identifying tissue-specific gene expression were conducted and completed right after the human genome project, which built tissue-specific gene expression profiles at the protein and RNA levels [3,4]. The protein distribution in human tissues was explored using 718 antibodies corresponding to 650 human protein-coding genes in the Human

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call