Abstract
Transposable elements (TEs) are dynamic components of genomes that often vary in copy number among members of the same species. With the advent of next-generation sequencing TE insertion-site polymorphism can be examined at an unprecedented level of detail when combined with easy-to-use bioinformatics software. Here we report a new tool, RelocaTE, that rapidly identifies specific TE insertions that are either polymorphic or shared between a reference and unassembled next-generation sequencing reads. Furthermore, a novel companion tool, CharacTErizer, exploits the depth of coverage to classify genotypes of nonreference insertions as homozygous, heterozygous or, when analyzing an active TE family, as rare somatic insertion or excision events. It does this by comparing the numbers of RelocaTE aligned reads to reads that map to the same genomic position without the TE. Although RelocaTE and CharacTErizer can be used for any TE, they were developed to analyze the very active mPing element which is undergoing massive amplification in specific strains of Oryza sativa (rice). Three individuals of one of these strains, A123, were resequenced and analyzed for mPing insertion site polymorphisms. The majority of mPing insertions found (~97%) are not present in the reference, and two siblings from a self-crossed of this strain were found to share only ~90% of their insertions. Private insertions are primarily heterozygous but include both homozygous and predicted somatic insertions. The reliability of the predicted genotypes was validated by polymerase chain reaction.
Highlights
Transposable elements (TEs) are fragments of DNA that often increase their copy number as they move from one genomic location to another
To facilitate this comparative analysis of mPing insertion sites, we report the development of RelocaTE and a companion tool, CharacTErizer
CharacTErizer predicts the genotype of nonreference insertions. We show that these tools are of general use in the identification of any TE insertion site in unassembled nextgeneration sequencing (NGS) reads where a reference genome, TE sequence and the target site duplication (TSD) information is available
Summary
Transposable elements (TEs) are fragments of DNA that often increase their copy number as they move from one genomic location to another. To understand how TEs increase in copy number without killing their host, we are characterizing the amplification of an extremely active element called mPing in rice (Oryza sativa) (Jiang et al 2003). There are 51 copies of the 430-bp mPing element in the reference Nipponbare (NB) genome (Naito et al 2006), several rice strains were identified with hundreds of copies. In a previous study we used vectorette polymerase chain reaction (PCR) coupled with 454 sequencing to characterize almost 1700 insertion sites of mPing in a small population of strain HEG4 and determined that the element had a preference for insertion into promoter regions (Naito et al 2009). We identified mPing insertion sites in multiple progeny by comparing unassembled nextgeneration sequencing (NGS) reads to the reference NB genome. A few tools have been created that use unassembled NGS reads to locate TE insertions (Iskow et al 2010; Witherspoon et al 2010; Linheiro and Bergman 2012; Tian et al 2012), but they do not have the capability to characterize the genotype of insertion sites
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.