Abstract
BackgroundAnalysis of large sets of biological sequence data from related strains or organisms is complicated by superficial redundancy in the set, which may contain many members that are identical except at one or two positions. Thus a new method, based on deriving physicochemical property (PCP)-consensus sequences, was tested for its ability to generate reference sequences and distinguish functionally significant changes from background variability.MethodsThe PCP consensus program was used to automatically derive consensus sequences starting from sequence alignments of proteins from Flaviviruses (from the Flavitrack database) and human enteroviruses, using a five dimensional set of Eigenvectors that summarize over 200 different scalar values for the PCPs of the amino acids. A PCP-consensus protein of a Dengue virus envelope protein was produced recombinantly and tested for its ability to bind antibodies to strains using ELISA.ResultsPCP-consensus sequences of the flavivirus family could be used to classify them into five discrete groups and distinguish areas of the envelope proteins that correlate with host specificity and disease type. A multivalent Dengue virus antigen was designed and shown to bind antibodies against all four DENV types. A consensus enteroviral VPg protein had the same distinctive high pKa as wild type proteins and was recognized by two different polymerases.ConclusionsThe process for deriving PCP-consensus sequences for any group of aligned similar sequences, has been validated for sequences with up to 50% diversity. Ongoing projects have shown that the method identifies residues that significantly alter PCPs at a given position, and might thus cause changes in function or immunogenicity. Other potential applications include deriving target proteins for drug design and diagnostic kits.
Highlights
Analysis of large sets of biological sequence data from related strains or organisms is complicated by superficial redundancy in the set, which may contain many members that are identical except at one or two positions
We will show some applications based on data stored in our Flavitrack database, which is a compendium of annotated Flavivirus sequences [9,10]
When catalogued, the strains appear redundant from a mathematical standpoint, with interstrain diversity occurring at fewer than 1% of positions. While much of this variation is neutral for phenotype, even a single point mutation can greatly alter the immunogenicity of the envelope protein or alter virus entry [38,40,45,46,47]. Recognizing such function-altering amino acid substitutions is important for designing vaccines that will protect against many Flaviviruses simultaneously, and entry inhibitors
Summary
Analysis of large sets of biological sequence data from related strains or organisms is complicated by superficial redundancy in the set, which may contain many members that are identical except at one or two positions. Very large alignments present intrinsic problems in discriminating residue conservation or patterns of variance, and require special software even to view them Another problem in dealing with biological datasets, such as the many Flavivirus sequences we have collected within the Flavitrack database [9,10], is that they often have a pronounced bias due to unequal distribution, which can arise from non-uniform sampling [11]. Conventional methods for calculating consensus sequences assume an unbiased data set, and typically calculate only the most common amino acid in a column [12]. An example of such a consensus (Figure 1A) shows that while it provides useful information on the degree of conservation of the amino acids in aligned sequences, it cannot suggest a rational choice of amino acid at highly variant positions. Profiling methods [13,14,15] based on amino acid scoring matrices can be used to obtain a consensus sequence, but these are primarily designed to detect distantly related members of a set of proteins
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.