LASAGNA: a novel algorithm for transcription factor binding site alignment.

Chih Lee,Chun-Hsi Huang

doi:10.1186/1471-2105-14-108

Chih Lee, Chun-Hsi Huang

Open Access

https://doi.org/10.1186/1471-2105-14-108

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Mar 24, 2013
Citations: 58	License type: CC BY 2.0

Affiliation: University of Connecticut

Abstract

BackgroundScientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs.ResultsWe designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences.ConclusionsWe conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/.

Highlights

Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs)
We proposed LASAGNA, a novel alignment algorithm designed for aligning variable-length transcription factor binding sites
Cross-validation results on 189 TFs and 4771 transcription factor binding site (TFBS) indicated that LASAGNA significantly outperformed ClustalW2 (p-value: 1.22 × 10−15) and MEME (p-value: 3.55 × 10−15)

Summary

Introduction

Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. A TFBS search algorithm takes binding site sequences of a TF as input. While the two approaches are tightly connected, we focus on the TFBS search problem and assume that a TF has known binding sites available

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LASAGNA: a novel algorithm for transcription factor binding site alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Marco Salvatore
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Marco Salvatore
03 Nov 2022
03 Nov 2022

P-Match: transcription factor binding site search by combining patterns and weight matrices
D S Chekmenev ... C Haid
Nucleic Acids Research | VOL. 33
D S Chekmenev, et. al.D S Chekmenev ... C Haid
27 Jun 2005
Nucleic Acids Research | VOL. 33

Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations
Peter K Rogan ... Peter Rogan
F1000Research | VOL. 7
Peter K Rogan, et. al.Peter K Rogan ... Peter Rogan
25 Mar 2019
F1000Research | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LASAGNA: a novel algorithm for transcription factor binding site alignment.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics