Abstract

Transcriptome sequencing has opened the field of genomics to a wide variety of researchers, owing to its efficiency, applicability across species and ability to quantify gene expression. The resulting datasets are a rich source of information that can be mined for many years into the future, with each dataset providing a unique angle on a specific context in biology. Maintaining accessibility to this accumulation of data presents quite a challenge for researchers.The primary focus of conventional genomics databases is the storage, navigation and interpretation of sequence data, which is typically classified down to the level of a species or individual. The addition of expression data adds a new dimension to this paradigm – the sampling context. Does gene expression describe different tissues, a temporal distribution or an experimental treatment? These data not only describe an individual, but the biological context surrounding that individual. The structure and utility of a transcriptome database must therefore reflect these attributes. We present an online database which has been designed to maximise the accessibility of crustacean transcriptome data by providing intuitive navigation within and between datasets and instant visualization of gene expression and protein structure.The site is accessible at https://crustybase.org and currently holds 10 datasets from a range of crustacean species. It also allows for upload of novel transcriptome datasets through a simple web interface, allowing the research community to contribute their own data to a pool of shared knowledge.

Highlights

  • In recent years, the advancement of next-generation sequencing (NGS) technologies have provided new and exciting opportunities for biologists in a variety of disciplines

  • Total RNA sequencing, commonly known as RNA-seq or transcriptome sequencing, has been an effective tool for curating and characterising genes across an expanding range of species in the past five years. This is wellreflected in gene expression data repositories held by the National Centre for Biotechnology Information (NCBI) [2], where the sequencing-based experiments hold over nine-times the species diversity than that of array-based experiments

  • A typical RNA-seq pipeline further augments these data by matching transcript sequences to known genes to provide “annotations” which can be directly queried by keyword search

Read more

Summary

Introduction

The advancement of next-generation sequencing (NGS) technologies have provided new and exciting opportunities for biologists in a variety of disciplines. A typical RNA-seq pipeline further augments these data by matching transcript sequences to known genes to provide “annotations” which can be directly queried by keyword search. A conventional sequence-oriented platform is far from adequate when it comes to accessing expression data; the best solution offered by NCBI allows researchers to upload spreadsheets of expression data as supplementary files to a corresponding Gene Expression Omnibus (GEO) record.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call