Abstract

BackgroundRNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process.ResultsIn this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. Availability: https://github.com/BioinfoUNIBA/REDItools2.ConclusionsHPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores.

Highlights

  • RNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity

  • Data loading optimization To experimentally test the speed improvement between High Performance Computing (HPC)-REDItools and REDItools, we created a dataset consisting of 10 RNA-seq samples randomly selected from the Genotype-Tissue Expression project (GTEx) project

  • HPC-REDItools are on average 8 times faster than REDItools. This finding is quite interesting because enables the use of HPC-REDItools to users with no access to HPC infrastructures and greatly speeds up the genome wide RNA editing detection

Read more

Summary

Conclusions

RNA editing is a co-/post-transcriptional phenomenon occurring in many organisms including animals and plants and has relevant biological implications. It can be detected employing RNA-seq data generated by high throughput sequencing technologies. As data volume increases, more powerful tools are required to analyse large number of samples in a time affordable way. In the present work we described HPC-REDItools, a HPC-aware tool for efficiently detect high-quality RNA-editing events from big data repositories on a HPC cluster. HPC-REDItools introduce at least three main algorithmic improvements over the previous version: i) high parallelism to employ the computational power available at High Performance Computing infrastructures; ii) optimised data loading that dramatically reduces computing time per genomic interval; iii) Dynamic Interval Analysis approach to improve workload balance across parallel processes.

Background
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.