Abstract

Transcription is the central process of gene regulation. In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). In different tissues, different transcription factors bind to their cis-regulatory motifs in these CRRs to drive tissue-specific expression patterns of their target genes. By combining the genome-wide gene expression data with the genomic sequence data, we proposed multiple-instance scoring (MIS) method to predict the tissue-specific motifs and the corresponding CRRs. The method is mainly based on the assumption that only a subset of CRRs of the expressed gene should function in the studied tissue. By testing on the simulated datasets and the fly muscle dataset, MIS can identify true motifs when noise is high and shows higher specificity for predicting the tissue-specific functions of CRRs.

Highlights

  • In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs)

  • Different transcription factors bind to the cis-regulatory motifs in these CRRs and lead to specific expression patterns of the gene in different tissues

  • Given the following inputs: 1) in the studied tissue, the genes are labeled as positive or negative according to whether they are expressed; 2) each gene has one or more candidate CRRs and the functional CRRs should be enriched for the positive genes but it is unknown which CRRs are functional; and 3) the enrichments of motifs have been calculated in each CRR, the proposed method should identify the enriched tissue-specific motifs and CRRs by analyzing the motifs’ enrichments in the CRRs of positive genes

Read more

Summary

INTRODUCTION

The transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). Zhang et al proposed a multiple-instance learning method, named multiple-instance learning via embedded instance selection (MILES) method to identify motifs [14] They regarded the motifs directly as instances but the method did not consider the problem of multiple CRRs for each gene. We proposed a more proper multiple-instance learning description of above problem: 1) define the genes as bags, and labels of genes are given as supervised information in the studied tissue; 2) define the candidate CRRs as instances and each instance is assigned to a unique bag (gene); and 3) define the feature space of instances as the vector consisting of the scores or enrichments of candidate motifs. By testing on the simulated data and the real data in fly, MIS shows higher power for identifying motifs and can achieve higher specificity for predicting the tissue-specific CRRs

MATERIALS AND METHODS
Evaluation of MIS method’s performance
Results on simulated datasets
Identifications of muscle-specific motifs and CRRs
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.