Well-annotated Datasets Research Articles

BackgroundTranscription factors are key proteins in the regulation of gene transcription. An important step in this process is the opening of chromatin in order to make genomic regions available for transcription. Data on DNase I hypersensitivity has previously been used to label a subset of transcription factors as Pioneers, Settlers and Migrants to describe their potential role in this process. These labels represent an interesting hypothesis on gene regulation and possibly a useful approach for data analysis, and therefore we wanted to expand the set of labeled transcription factors to include as many known factors as possible. We have used a well-annotated dataset of 1175 transcription factors as input to supervised machine learning methods, using the subset with previously assigned labels as training set. We then used the final classifier to label the additional transcription factors according to their potential role as Pioneers, Settlers and Migrants. The full set of labeled transcription factors was used to investigate associated properties and functions of each class, including an analysis of interaction data for transcription factors based on DNA co-binding and protein-protein interactions. We also used the assigned labels to analyze a previously published set of gene lists associated with a time course experiment on cell differentiation.ResultsThe analysis showed that the classification of transcription factors with respect to their potential role in chromatin opening largely was determined by how they bind to DNA. Each subclass of transcription factors was enriched for properties that seemed to characterize the subclass relative to its role in gene regulation, with very general functions for Pioneers, whereas Migrants to a larger extent were associated with specific processes. Further analysis showed that the expanded classification is a useful resource for analyzing other datasets on transcription factors with respect to their potential role in gene regulation. The analysis of transcription factor interaction data showed complementary differences between the subclasses, where transcription factors labeled as Pioneers often interact with other transcription factors through DNA co-binding, whereas Migrants to a larger extent use protein-protein interactions. The analysis of time course data on cell differentiation indicated a shift in the regulatory program associated with Pioneer-like transcription factors during differentiation.ConclusionsThe expanded classification is an interesting resource for analyzing data on gene regulation, as illustrated here on transcription factor interaction data and data from a time course experiment. The potential regulatory function of transcription factors seems largely to be determined by how they bind DNA, but is also influenced by how they interact with each other through cooperativity and protein-protein interactions.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1349-2) contains supplementary material, which is available to authorized users.

BackgroundMaize (Zea mays ssp. mays L.) is an important model for plant basic and applied research. In 2009, the B73 maize genome sequencing made a great step forward, using clone by clone strategy; however, functional annotation and gene classification of the maize genome are still limited. Thus, a well-annotated datasets and informative database will be important for further research discoveries. Signal transduction is a fundamental biological process in living cells, and many protein families participate in this process in sensing, amplifying and responding to various extracellular or internal stimuli. Therefore, it is a good starting point to integrate information on the maize functional genes involved in signal transduction.ResultsHere we introduce a comprehensive database 'ProFITS' (Protein Families Involved in the Transduction of Signalling), which endeavours to identify and classify protein kinases/phosphatases, transcription factors and ubiquitin-proteasome-system related genes in the B73 maize genome. Users can explore gene models, corresponding transcripts and FLcDNAs using the three abovementioned protein hierarchical categories, and visualize them using an AJAX-based genome browser (JBrowse) or Generic Genome Browser (GBrowse). Functional annotations such as GO annotation, protein signatures, protein best-hits in the Arabidopsis and rice genome are provided. In addition, pre-calculated transcription factor binding sites of each gene are generated and mutant information is incorporated into ProFITS. In short, ProFITS provides a user-friendly web interface for studies in signal transduction process in maize.ConclusionProFITS, which utilizes both the B73 maize genome and full length cDNA (FLcDNA) datasets, provides users a comprehensive platform of maize annotation with specific focus on the categorization of families involved in the signal transduction process. ProFITS is designed as a user-friendly web interface and it is valuable for experimental researchers. It is freely available now to all users at http://bioinfo.cau.edu.cn/ProFITS.

Well-annotated Datasets Research Articles

Related Topics

Articles published on Well-annotated Datasets

Robust 3-D Human Detection in Complex Environments With a Depth Camera

Supervised Segmentation of Un-Annotated Retinal Fundus Images by Synthesis.

Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains.

A semantic relatedness-based solution for reducing missing problem in TBIR

Feature-based classification of human transcription factors into hypothetical sub-classes related to regulatory function.

A comparative study for multiple visual concepts detection in images and videos

A property-based analysis of human transcription factors.

Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures

Validation of Genetic Sequence Variants as Prognostic Factors in Early-Stage Head and Neck Squamous Cell Cancer Survival

Predicting B cell epitope residues with network topology based amino acid indices.

ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

Linking publication, gene and protein data

Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Well-annotated Datasets Research Articles

Related Topics

Articles published on Well-annotated Datasets

Robust 3-D Human Detection in Complex Environments With a Depth Camera

Supervised Segmentation of Un-Annotated Retinal Fundus Images by Synthesis.

Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains.

A semantic relatedness-based solution for reducing missing problem in TBIR

Feature-based classification of human transcription factors into hypothetical sub-classes related to regulatory function.

A comparative study for multiple visual concepts detection in images and videos

A property-based analysis of human transcription factors.

Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures

Validation of Genetic Sequence Variants as Prognostic Factors in Early-Stage Head and Neck Squamous Cell Cancer Survival

Predicting B cell epitope residues with network topology based amino acid indices.

ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

Linking publication, gene and protein data

Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans