Abstract

BLAST (Basic Local Alignment Search Tool; [1],[2]) is the key bioinformatic tool for sequence comparison and retrieval from databases. BLAST is often the first step in using sequence-based information to design experiments and contextualize experimental results. Its speed and ease of use help account for this, with experiments requiring simply submitting a sequence of interest (the query sequence) and waiting a few seconds. At the same time, by rendering thinking unnecessary, BLAST is often used suboptimally, with many experienced researchers simply using the default parameters because they do not know how to manipulate them or accepting results with little understanding of their full meaning (or lack thereof). Recognition of the importance of BLAST to modern life sciences has led to its use in many biology courses, even at the high school level, to introduce students to bioinformatics applications in the life sciences. Concepts of molecular evolution (e.g., gene duplication and divergence; orthologs versus paralogs) are quite abstract and are best communicated with examples (Box 1). It is possible to use case studies from the literature, but the abundance of sequence data present in public databases raises the far more attractive possibility of using searches tailored to a particular course, or, better yet, allowing the students to choose their own examples. Box 1. Concepts at a Glance Leads into biological sciences Molecular evolution (e.g., homologs, paralogs, orthologs) Gene alignment Domain structure of proteins Biochemical nature of amino acids: “frequent” and “infrequent” substitutions Conserved versus divergent regions of genes Leads into math Amino acid identity matrices Quantification of sequence identity and similarity Leads into information technology Database queries and their tradeoffs (speed versus completeness) Large dataset management Less obviously, another benefit of teaching students how the BLAST algorithm works is that it provides an opportunity to illustrate how mathematics functions as a language of biology. For example, given that BLAST has been designed to retrieve homologs, there are several steps in the BLAST program that incorporate molecular evolution concepts to maximize the possibility of finding sequences with a shared evolutionary history. More basically, understanding the steps in the calculation of an E-value provides an opportunity to show the relationship between how the algorithm works and fundamental principles of biochemistry and evolution. Here, we provide an approach to teaching the basics of BLAST to students in order to emphasize how the algorithm translates fundamental biological principles into numerical terms culminating in an E value. Acquiring a feel for the algorithm and exploring genomic sequence data has the potential to inform a student's grasp of biomedical, biochemical, and biogeochemical concepts, presenting an excellent opportunity for multidisciplinary integration.

Highlights

  • BLAST (Basic Local Alignment Search Tool; [1,2]) is the key bioinformatic tool for sequence comparison and retrieval from databases

  • Recognition of the importance of BLAST to modern life sciences has led to its use in many biology courses, even at the high school level, to introduce students to bioinformatics applications in the life sciences

  • To begin a BLAST search of a database, the user provides a query sequence, which is a nucleotide or amino acid sequence for which they are interested in finding homologs

Read more

Summary

Introduction

BLAST (Basic Local Alignment Search Tool; [1,2]) is the key bioinformatic tool for sequence comparison and retrieval from databases. It is possible to use case studies from the literature, but the abundance of sequence data present in public databases raises the far more attractive possibility of using searches tailored to a particular course, or, better yet, allowing the students to choose their own examples. Less obviously, another benefit of teaching students how the BLAST algorithm works is that it provides an opportunity to illustrate how mathematics functions as a language of biology. Acquiring a feel for the algorithm and exploring genomic sequence data has the potential to inform a student’s grasp of biomedical, biochemical, and biogeochemical concepts, presenting an excellent opportunity for multidisciplinary integration

Explaining the Relationship between Aligning Sequences and Evolutionary Biology
Leads into math
Alignments are extended position by position with concomitant scoring until
Substitution Matrices and Protein Biochemistry
Online resources
Questions to ask after the lesson
Meaning and Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call