Abstract

BackgroundDetecting conserved noncoding sequences (CNSs) across species highlights the functional elements. Alignment procedures combined with computational prediction of transcription factor binding sites (TFBSs) can narrow down key regulatory elements. Repeat masking processes are often performed before alignment to mask insertion sequences such as transposable elements (TEs). However, recently such TEs have been reported to influence the gene regulatory network evolution. Therefore, an alignment approach that is robust to TE insertions is meaningful for finding novel conserved TFBSs in TEs.ResultsWe constructed a web server 'ReAlignerV' for complex alignment of genomic sequences. ReAlignerV returns ladder-like schematic alignments that integrate predicted TFBSs and the location of TEs. It also provides pair-wise alignments in which the predicted TFBS sites and their names are shown alongside each sequence. Furthermore, we evaluated false positive aligned sites by focusing on the species-specific TEs (SSTEs), and found that ReAlignerV has a higher specificity and robustness to insertions for sequences having more than 20% TE content, compared to LAGAN, AVID, MAVID and BLASTZ.ConclusionReAlignerV can be applied successfully to TE-insertion-rich sequences without prior repeat masking, and this increases the chances of finding regulatory sequences hidden in TEs, which are important sources of the regulatory network evolution. ReAlignerV can be accessed through and downloaded from .

Highlights

  • Detecting conserved noncoding sequences (CNSs) across species highlights the functional elements

  • Another is the possibility that repeat masking processes, that are in many cases performed as a preprocess, can hide the important functional elements that are embedded in the transposable elements (TEs) insertion sequences

  • Orthologous sequence set for comparison of aligners We downloaded the reference genome sequence (RefSeq) and annotation files [23,24] of human (NCBI Build 36.1), mouse (NCBI Build 36.1) and rat (RGSC v3.4) from NCBI, and surveyed the features of all the nuclear proteincoding genes for the three species

Read more

Summary

Results

Orthologous sequence set for comparison of aligners We downloaded the reference genome sequence (RefSeq) and annotation files [23,24] of human (NCBI Build 36.1), mouse (NCBI Build 36.1) and rat (RGSC v3.4) from NCBI, and surveyed the features of all the nuclear proteincoding genes for the three species. ReAlignerV uses a straightforward strategy to set the single anchor at the last nucleotide of each query sequence and performs no recursive alignment process after the seed extension Despite such a simple approach, ReAlignerV shows a good agreement between specificity and ACR, indicated by the high significance index for the dataset used. The 8-kb genomic sequences upstream of the translation start sites of human and mouse CCL3 genes were aligned, and the alignments were integrated with results from RepeatMasker and TRANSFAC Match searches by ReAlignerV In this prediction, only CRE sites were searched for by TRANSFAC Match (A), and two conserved CRE sites, shown in red squares (B), were found to be functional by an experimental study. These CRE sites could be SSTE-mediated TFBS candidates worth further investigation as to whether they could affect the linage-specific transcriptional control

Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call