Abstract

rfaRm is an R package providing a client-side interface for the Rfam database of non-coding RNA and other structured RNA elements. The package facilitates the search of the Rfam database by keywords or sequences, as well as the retrieval of all available information about specific Rfam families, such as member sequences, multiple sequence alignments, secondary structures and covariance models. By providing such programmatic access to the Rfam database, rfaRm enables genomic workflows to incorporate information about non-coding RNA, whose potential cannot be fully exploited just through interactive access to the database. The features of rfaRm are demonstrated by using it to analyze the SARS-CoV-2 genome as an example case.

Highlights

  • The Rfam database [1] is a collection of families of non-coding RNA and other structured RNA elements

  • Different pieces of information can be retrieved for each RNA family, including a descriptive summary, secondary structure information and consensus sequence, amongst many others

  • We present a client-side interface to the Rfam database, enabling its programmatic access and expanding the scope of the genomic analysis that can be carried out with the information provided by the Rfam database

Read more

Summary

Introduction

The Rfam database [1] is a collection of families of non-coding RNA and other structured RNA elements. Example of usage ## Save an SVG file with a diagram of the secondary structure ## of the Rfam family with accession RF00005 (tRNA), colored ## by sequence conservation rfamSecondaryStructureXMLSVG(rfamFamily = "RF00005", filename = "test.svg", plotType = "cons"). Example of usage ## Obtain the seed alignment of the Rfam family with ## accession RF00005 (tRNA) in the Stockholm format and ## save it to a file rfamSeedAlignment(rfamFamily = "RF00005", filename = "test.stk", format = "stockholm"). Example of usage ## Obtain the phylogenetic tree of seed alignment of the ## Rfam family with accession RF00005 (tRNA) and save it ## to a file rfamSeedTree(rfamFamily = "RF00005", filename = "test.nhx"). Example of usage ## Plot the phylogenetic tree of seed alignment of the ## Rfam family with accession RF00005 (tRNA) labelled with ## species names rfamSeedTreeImage(rfamFamily = "RF00005", label = "species"). All identified families were non-coding RNA elements typically found in the genome of beta-coronaviruses: 1. bCoV-5UTR: 5’ untranslated region comprising 150–200 nucleotides found in betacoronaviruses

Sarbecovirus-5UTR
Sarbecovirus-3UTR
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call