Abstract
BackgroundEndogenous murine leukemia retroviruses (MLVs) are high copy number proviral elements difficult to comprehensively characterize using standard low throughput sequencing approaches. However, high throughput approaches generate data that is challenging to process, interpret and present.ResultsNext generation sequencing (NGS) data was generated for MLVs from two wild caught Mus musculus domesticus (from mainland France and Corsica) and for inbred laboratory mouse strains C3H, LP/J and SJL. Sequence reads were grouped using a novel sequence clustering approach as applied to retroviral sequences. A Markov cluster algorithm was employed, and the sequence reads were queried for matches to specific xenotropic (Xmv), polytropic (Pmv) and modified polytropic (Mpmv) viral reference sequences.ConclusionsVarious MLV subtypes were more widespread than expected among the mice, which may be due to the higher coverage of NGS, or to the presence of similar sequence across many different proviral loci. The results did not correlate with variation in the major MLV receptor Xpr1, which can restrict exogenous MLVs, suggesting that endogenous MLV distribution may reflect gene flow more than past resistance to infection.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1766-z) contains supplementary material, which is available to authorized users.
Highlights
Endogenous murine leukemia retroviruses (MLVs) are high copy number proviral elements difficult to comprehensively characterize using standard low throughput sequencing approaches
Our analyses show that various MLV subtypes are more widespread than expected among the mice, which may be due to the higher coverage of next generation sequencing (NGS), or to the presence of similar sequence across many proviral loci
For each of the xenotropic MLV (Xmv), Polytropic MLVs (Pmv) and Modified polytropic MLVs (Mpmv) reference sequences reported previously [4, 7], we identified the sequence read in each sample that had the highest pairwise match to each of these reference sequences
Summary
Endogenous murine leukemia retroviruses (MLVs) are high copy number proviral elements difficult to comprehensively characterize using standard low throughput sequencing approaches. Because endogenous MLVs are highly variable in sequence and present in the genome at high copy number, a comprehensive analysis of their presence and distribution has generally been difficult: low throughput data sets generated by Sanger sequencing may only reveal a small proportion of the diversity. MLV diversity [2, 3], these datasets are often exceptionally complex, consisting of tens of thousands to many millions of sequence reads These high-throughput data sets are not amenable to standard phylogenetic analysis, as there are substantial challenges for computing, evaluating, and visualizing alignments and phylogenies for such large data sets. We performed detailed sequence comparisons to determine the presence of specific viral reference sequences in these mice
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.