MiWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes.

Sagar Gupta,Ravi Shankar

doi:10.1093/bib/bbad088

Abstract

Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving >10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https://scbb.ihbt.res.in/miWords/index.php.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MiWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes.

Abstract

Talk to us

Similar Papers

More From: Briefings in bioinformatics

Lead the way for us

Journal: Briefings in bioinformatics	Publication Date: Mar 15, 2023
Citations: 5

Similar Papers

Plants meet machines: Prospects in machine learning for plant biology
Pamela S Soltis ... Emily K Meineke
Applications in Plant Sciences | VOL. 8
Pamela S Soltis, et. al.Pamela S Soltis ... Emily K Meineke
01 Jun 2020
Applications in Plant Sciences | VOL. 8

MiRNA Digger: a comprehensive pipeline for genome-wide novel miRNA mining
Lan Yu ... Ming Chen
Scientific Reports | VOL. 6
Lan Yu, et. al.Lan Yu ... Ming Chen
06 Jan 2016
Scientific Reports | VOL. 6

Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
Long Yang ... Hwan-Gue Cho
Genomics & Informatics | VOL. 10
Long Yang, et. al.Long Yang ... Hwan-Gue Cho
01 Jan 2012
Genomics & Informatics | VOL. 10

Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs
Nickolai N Alexandrov ... Maxim E Troukhan
Plant Molecular Biology | VOL. 60
Nickolai N Alexandrov, et. al.Nickolai N Alexandrov ... Maxim E Troukhan
01 Jan 2006
Plant Molecular Biology | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MiWords: transformer-based composite deep learning for highly accurate discovery of pre-miRNA regions across plant genomes.

Abstract

Talk to us

Similar Papers

More From: Briefings in bioinformatics