Abstract

BackgroundThe identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics. Frequently such patterns are complex and highly variable, especially in protein sequences. They are frequently described using terms of regular expressions (RegEx) because of the user-friendly terminology. Limitations arise for queries with the increasing complexity of patterns and are accompanied by requirements for enhanced capabilities. This is especially true for patterns containing ambiguous characters and positions and/or length ambiguities.ResultsWe have implemented the 3of5 web application in order to enable complex pattern matching in protein sequences. 3of5 is named after a special use of its main feature, the novel n-of-m pattern type. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The number of distinct elements can be constrained by operators, and individual characters may be excluded. The n-of-m pattern type can be combined with common regular expression terms and thus also allows for a comprehensive description of complex patterns. 3of5 increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of simply reporting the longest or shortest hits. Grouping and combined search for patterns provides a hierarchical arrangement of larger patterns sets. The algorithm is implemented as internet application and freely accessible. The application is available at .ConclusionThe 3of5 application offers an extended vocabulary for the definition of search patterns and thus allows the user to comprehensively specify and identify peptide patterns with variable elements. The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms.

Highlights

  • The identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics

  • The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms

  • The pattern matching can be constrained to the two ends of the sequence: a preceding "^" symbol limits the pattern to matches at the N-terminus of a protein sequence, a succeeding "$" symbol constrains it to the end at the C-terminus

Read more

Summary

Results

We have implemented the 3of web application in order to enable complex pattern matching in protein sequences. 3of is named after a special use of its main feature, the novel n-ofm pattern type. 3of is named after a special use of its main feature, the novel n-ofm pattern type. We have implemented the 3of web application in order to enable complex pattern matching in protein sequences. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The n-of-m pattern type can be combined with common regular expression terms and allows for a comprehensive description of complex patterns. 3of increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of reporting the longest or shortest hits.

Conclusion
Background
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call