Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.
Read full abstract