Abstract
Transposable elements (TEs) are genomic units able to move within the genome of virtually all organisms. Due to their natural repetitive numbers and their high structural diversity, the identification and classification of TEs remain a challenge in sequenced genomes. Although TEs were initially regarded as “junk DNA”, it has been demonstrated that they play key roles in chromosome structures, gene expression, and regulation, as well as adaptation and evolution. A highly reliable annotation of these elements is, therefore, crucial to better understand genome functions and their evolution. To date, much bioinformatics software has been developed to address TE detection and classification processes, but many problematic aspects remain, such as the reliability, precision, and speed of the analyses. Machine learning and deep learning are algorithms that can make automatic predictions and decisions in a wide variety of scientific applications. They have been tested in bioinformatics and, more specifically for TEs, classification with encouraging results. In this review, we will discuss important aspects of TEs, such as their structure, importance in the evolution and architecture of the host, and their current classifications and nomenclatures. We will also address current methods and their limitations in identifying and classifying TEs.
Highlights
Transposable elements (TEs) are genomic units able to move within and among the genomes of virtually all organisms [1]
Machine learning (ML) and deep learning (DL) may represent the new generation of bioinformatics approaches, especially for TEs [214]
Both techniques have been tested in many genomic areas, demonstrating very high levels of success, yet their application in TEs is limited
Summary
Transposable elements (TEs) are genomic units able to move within and among the genomes of virtually all organisms [1]. TEs represent the most repetitive sequences [5] They are able to move in the genomes, generate mutations, and obviously amplify the number of their copies [6]. TEs moving via an RNA molecule called retrotransposons fall into Class I, while elements moving via a DNA molecule, called transposons, are classified into Class II [8] They represent the vast majority of TEs found in plant genomes due to their mobility mechanisms. Several methods were developed to identify and annotate transposable elements in sequenced genomes These are classified into four categories: de novo, structure-based, comparative genomics, and homology-based [17]. For the reasons mentioned above, we focused on them in this review
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have