Abstract

In Chapter 5, we assumed that a reasonable multiple sequence alignment was already known and provided the starting point for constructing a profile HMM. We now look at what a ‘reasonable’ multiple alignment is, and at ways to construct one automatically from unaligned sequences. Multiple alignments must usually be inferred from primary sequences alone. Biologists produce high quality multiple sequence alignments by hand using expert knowledge of protein sequence evolution. This knowledge comes from experience. Important factors include: specific sorts of columns in alignments, such as highly conserved residues or buried hydrophobic residues; the influence of secondary and tertiary structure, such as the alternation of hydrophobic and hydrophilic columns in exposed beta sheet; and expected patterns of insertions and deletions, that tend to alternate with blocks of conserved sequence. Furthermore, the phylogenetic relationships between sequences dictate constraints on the changes that occur in columns and in the patterns of gaps. RNA alignments involve similar knowledge but additionally they are often strongly constrained by a secondary structure model that in many cases has also been inferred from primary sequence data (Chapter 10). Manual multiple alignment is tedious. Automatic multiple sequence alignment methods are a topic of extensive research in computational biology. In general, an automatic method must have a way to assign a score so that better multiple alignments get better scores. We should carefully distinguish the problem of scoring a multiple alignment from the problem of searching over possible multiple alignments to find the best one.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call