Transcription factor motif quality assessment requires systematic comparative analysis

Caleb Kipkurui Kibet,Philip Machanick

doi:10.12688/f1000research.7408.1

Caleb Kipkurui Kibet, Philip Machanick

Open Access

https://doi.org/10.12688/f1000research.7408.1

Copy DOI

Journal: F1000Research	Publication Date: Dec 11, 2015
Citations: 10	License type: CC BY 4.0

Affiliation: Rhodes University

Abstract

Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. Finally, we demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.

Highlights

Understanding gene regulation remains a long-standing problem in biological research
We focus on Transcription factor (TF) binding models represented as a position weight matrix (PWM) and aim to determine how the choice and length of benchmark sequences, scoring functions, and the statistics influence motif assessment
We have described a comparative analysis on the effect of scoring functions, chromatin immunoprecipitation (ChIP)-seq test data processing and statistics on motif assessment

Summary

Introduction

Understanding gene regulation remains a long-standing problem in biological research. The main players, transcription factors (TFs), are proteins that bind to short and potentially degenerate sequence patterns (motifs) at gene regulatory sites to promote or repress expression of target genes. The search for a code to predict binding sites and model binding affinity of TFs has led to several experimental techniques and motif discovery algorithms being developed (Figure 1). In addition to providing high resolution data for motif discovery, they are a useful resource to test the quality of the available motifs since they are TF specific. A position weight matrix (PWM) is the common form of representing TF binding specificity. Motifs can be found using a variety of methods including algorithms that do de novo motif discovery from sequences containing binding sites and in vitro methods such as protein binding microarrays (PBM) and high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transcription factor motif quality assessment requires systematic comparative analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Transcription factor motif quality assessment requires systematic comparative analysis
Caleb Kipkurui Kibet ... Caleb Kipkurui
F1000Research | VOL. 4
Caleb Kipkurui Kibet, et. al.Caleb Kipkurui Kibet ... Caleb Kipkurui
10 Feb 2016
F1000Research | VOL. 4

Transcription factor motif quality assessment requires systematic comparative analysis.
Caleb Kipkurui Kibet ... Philip Machanick
F1000Research | VOL. 4
Caleb Kipkurui Kibet, et. al.Caleb Kipkurui Kibet ... Philip Machanick
14 Mar 2016
F1000Research | VOL. 4

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Nicolas Alcaraz
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Nicolas Alcaraz
03 Nov 2022
03 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transcription factor motif quality assessment requires systematic comparative analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research