Abstract

BackgroundTranscription factors (TFs) are the upstream regulators that orchestrate gene expression, and therefore a centrepiece in bioinformatics studies. While a core strategy to understand the biological context of genes and proteins includes annotation enrichment analysis, such as Gene Ontology term enrichment, these methods are not well suited for analysing groups of TFs. This is particularly true since such methods do not aim to include downstream processes, and given a set of TFs, the expected top ontologies would revolve around transcription processes.ResultsWe present the TFTenricher, a Python toolbox that focuses specifically at identifying gene ontology terms, cellular pathways, and diseases that are over-represented among genes downstream of user-defined sets of human TFs. We evaluated the inference of downstream gene targets with respect to false positive annotations, and found an inference based on co-expression to best predict downstream processes. Based on these downstream genes, the TFTenricher uses some of the most common databases for gene functionalities, including GO, KEGG and Reactome, to calculate functional enrichments. By applying the TFTenricher to differential expression of TFs in 21 diseases, we found significant terms associated with disease mechanism, while the gene set enrichment analysis on the same dataset predominantly identified processes related to transcription.Conclusions and availabilityThe TFTenricher package enables users to search for biological context in any set of TFs and their downstream genes. The TFTenricher is available as a Python 3 toolbox at https://github.com/rasma774/Tftenricher, under a GNU GPL license and with minimal dependencies.

Highlights

  • Transcription factors (TFs) are the upstream regulators that orchestrate gene expression, and a centrepiece in bioinformatics studies

  • The TFTenricher increases power in TF‐oriented annotation analyses We analysed performance by randomly drawing transcription factors (TFs) from the Human Transcription Factors database [9], which annotates TFs based on a broad selection of popular databases

  • We found the TFTenricher to identify a median of 54 terms at a false discovery rate of 0.05, whereas applying TFTenricher on TFs only resulted in a median of 12 identified terms per dataset (Wilcoxon signed-rank test p < 0.006)

Read more

Summary

Results

The TFTenricher increases power in TF‐oriented annotation analyses We analysed performance by randomly drawing transcription factors (TFs) from the Human Transcription Factors database [9], which annotates TFs based on a broad selection of popular databases. Correlation‐based inference of downstream processes minimises false positive identifications To date, there is no complete interaction map between human TFs and their target genes, and there are multiple available approaches to infer such interactions [10] Whereas most such approaches infer bindings from specific datasets, we sought to include dataset-independent TF-target interaction maps. By applying the TFTenricher to 100 sets of random TFs we found the co-expression based TF-target inference method to result in considerably fewer false positive identifications, with on average 2.16 GO terms (Additional file 3). The majority of these GO terms were related to transcription, with the terms mRNA splicing, via spliceosome, and mRNA processing accounting for 57% of all identified terms We speculate these identifications being due to the TFTenricher, by the nature of the correlation-based target gene inference, identifying genes that are involved in transcription without being TFs themselves. All TFtarget inference methods contain various drawbacks and we built the TFTenricher to allow for independent TFtarget mappings supplied by the user

Conclusions and availability
Background
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call