Abstract

BackgroundPositional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.ResultsHere we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.ConclusionsIn an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.

Highlights

  • Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities

  • Even though the PWM model has been criticized for its simplicity and intrinsic limitations, it is likely to remain the community standard for many years to come, as many popular DNA sequence analysis platforms use it and are unlikely to support a new type of model in the near future

  • The purpose of the study was to compare PWM models derived with two in vitro technologies, HT-SELEX and protein-binding microarrays (PBM), and to assess their capacity to predict in vivo TF binding sites, using ChIP-seq data as the ground truth

Read more

Summary

Introduction

Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets. A position weight matrix (PWM) assigns scores to potential target sequences. This score is related to the binding energy of a TF for a particular stretch of nucleotides. Even though the PWM model has been criticized for its simplicity and intrinsic limitations, it is likely to remain the community standard for many years to come, as many popular DNA sequence analysis platforms use it and are unlikely to support a new type of model in the near future

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.