Abstract

BackgroundBiologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells.ResultsKangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/.ConclusionA low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.

Highlights

  • Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern

  • In this work we present a new web-based pattern-matching program that identifies protein or DNA records containing patterns of interest in a number of model organisms

  • A number of genes containing mononucleotide repeats were implicated in a distinct type of colorectal cancer (CRC), which is characterized by increased rates of mutations in those repeat units [7]

Read more

Summary

Introduction

Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. Other matching programs are specific to one organism or search through a specific subset of sequence data In spite of these advanced search techniques there is still a need for a simple, unassuming low-level pattern-matching program when looking for very specific motifs in DNA or protein records. Such motifs may be novel protein binding signatures, repetitive sequences, transcription factor binding sites, protein (page number not for citation purposes)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call