Abstract

Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes, including plants and animals. Classified as a type of non-autonomous DNA transposable elements, they play important roles in genome organization and evolution. Comprehensive and accurate genome-wide detection of MITEs in various eukaryotic genomes can improve our understanding of their origins, transposition processes, regulatory mechanisms, and biological relevance with regard to gene structures, expression, and regulation. In this paper, we present a new MATLAB-based program called detectMITE that employs a novel numeric calculation algorithm to replace conventional string matching algorithms in MITE detection, adopts the Lempel-Ziv complexity algorithm to filter out MITE candidates with low complexity, and utilizes the powerful clustering program CD-HIT to cluster similar MITEs into MITE families. Using the rice genome as test data, we found that detectMITE can more accurately, comprehensively, and efficiently detect MITEs on a genome-wide scale than other popular MITE detection tools. Through comparison with the potential MITEs annotated in Repbase, the widely used eukaryotic repeat database, detectMITE has been shown to find known and novel MITEs with a complete structure and full-length copies in the genome. detectMITE is an open source tool (https://sourceforge.net/projects/detectmite).

Highlights

  • Found to have higher expression than those adjacent to Miniature inverted repeat transposable elements (MITEs) or containing MITEs13

  • The major bioinformatics methods in TE identification can be classified into three groups: de novo, structure-based, and homology-based methods21,22. de novo methods focus on the innate characteristic of TEs to discover hidden TEs in genomes, without any prior information. de novo methods are suitable for identifying both known and novel TEs, but detection results often contain a mixture of different types of TEs and non-TE repeats, which necessitate further classification and filtration

  • After retrieving all putative MITE candidates, MITE Uncovering SysTem (MUST) groups them into MITE families based on the sequence similarity of the internal sequences between terminal inverted repeats (TIRs) pairs[27]

Read more

Summary

Introduction

Found to have higher expression than those adjacent to MITEs or containing MITEs13. Comparative analysis of MITEs in Brassica rapa, Brassica oleracea, and Arabidopsis thaliana demonstrated that MITEs play dynamic roles in genome evolution of the Brassica[18]. Using programs like BLAST23, RepeatMasker[24] and HUMMER325, homology-based methods utilize sequence similarities between putative and known TEs to detect TEs hidden in genomes They are good at detecting real TEs, even those with a single copy in genomes. All putative MITE sequences meeting these requirements will be retained, except TIRs with high A/T or C/G content or TIRs including simple repeats[26] Another structure-based method, MITE Uncovering SysTem (MUST)[27] uses a string matching algorithm to detect sequences with a TIR pair within a window ≤ 500 nt and retains those sequences flanked by TSDs. After retrieving all putative MITE candidates, MUST groups them into MITE families based on the sequence similarity of the internal sequences between TIR pairs[27]. MITE Digger has shown a significant improvement in detection efficiency, as demonstrated for the rice genome (i.e., ~15 hours) Both MITE-Hunter and MITE Digger utilized a mixture of both de novo and structure-based methods in MITE detection. BrassicaTED is a specialized database for Brassica species, which contains MITEs, TRIMs (Terminal Repeat Retrotransposon in Miniatures), and SINEs (Short Interspersed Elements)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.