Abstract

BackgroundThe high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences.ResultsWe developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope.ConclusionsNGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .

Highlights

  • The high variability in envelope regions of some viruses such as Human Immunodeficiency Viruses (HIV) allow the virus to establish infection and to escape subsequent immune surveillance

  • Alignment comparisons of founder and chronic HIV‐1B envelope A wide selection of global alignment methods Muscle [21], ClustalW [20], ClustalOmega [22], HIVAlign [14] and T-Coffee [23] were executed with their default settings for a test set of 156 subtype B HIV-1 gp160 sequences, obtained from individuals at both the founder and chronic stages of infection

  • There is no ‘gold standard’ that we can use for this comparison as true alignments are unknown and no straight-forward measure to assess the quality of the alignment

Read more

Summary

Introduction

The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Alignments fail in the hypervariable regions of the envelope gene, to the extent that these regions are often omitted when attempting to describe properties across individuals [4, 5] It is these regions that portray modes of viral evolution in response to immune pressure via incorporation of insertions or deletions (indels) and N-linked glycosylation [6]. Improved methods are required to properly align these highly variable sequences to capture information on how they have functionally evolved or diverged due to immune pressure

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.