Abstract

Similarity searching is an important tool to many biological scientists. Various computer implementations (BLAST, FASTA, Smith-Waterman) are used by scientists to analyze their sequences of interest to identify identities (perfect matches) or similarities (statistically significant matches) between their query sequences and large databases such as GenBank. Search engines currently return brief annotations and alignments ranked in order of statistical significance or raw similarity score. However, it is frequently not the top-scoring similarities that bring important new information to the investigating scientist, but the content of the annotation or similarity hits at any significant score. The Gene Alert algorithm applies additional filtering and a user weighted keyword search to the BLAST output to parse the output into a form customized to the user. There are three components to the Gene Alert implementation as it is currently operating: an organized file structure, a BLAST engine, and a parser written in the PERL scripting language. The file structure was designed to place code and database components in logical positions and to facilitate future complete automation of the Gene Alert and similarity search system. Shown here is the file structure within the UNIX environment.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.