Abstract

The current genome sequencing projects reveal megabases of unknown genomic sequences. About 1% of these sequences can be expected to be of retroviral origin. These are often severely deleted or mutated. Therefore, identification of the retroviral origin of these sequences can be very difficult due to the absence of convincing overall sequence similarity. There are also many copies of solo-LTRs (long terminal repeats) distributed throughout genomic sequences. LTR and envelope sequences in general are among the most divergent parts of the retroviral genome and thus especially hard to detect in mutated endogenous sequences. We took advantage of the fact that these retroviral sections contain short highly conserved sequence regions providing retroviral hallmarks even after loss of overall similarity. We defined several sequence elements and peptide motifs within LTR and Env sequences and used these elements to construct models for LTRs and Env proteins of mammalian C-type retroviruses. We then used this strategy to identify successfully the hitherto missing LTRs and anenv-like region in the S71 human retroviral sequence. Our approach provides a new strategy for identifying remotely related retroviral sequences in genomic DNA (especially human DNA), of potential significance for the interpretation of genomic sequences obtained from the current large-scale sequencing projects.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.