Abstract

BackgroundSeveral large public repositories of microarray datasets and RNA-seq data are available. Two prominent examples include ArrayExpress and NCBI GEO. Unfortunately, there is no easy way to import and manipulate data from such resources, because the data is stored in large files, requiring large bandwidth to download and special purpose data manipulation tools to extract subsets relevant for the specific analysis.ResultsTACITuS is a web-based system that supports rapid query access to high-throughput microarray and NGS repositories. The system is equipped with modules capable of managing large files, storing them in a cloud environment and extracting subsets of data in an easy and efficient way. The system also supports the ability to import data into Galaxy for further analysis.ConclusionsTACITuS automates most of the pre-processing needed to analyze high-throughput microarray and NGS data from large publicly-available repositories. The system implements several modules to manage large files in an easy and efficient way. Furthermore, it is capable deal with Galaxy environment allowing users to analyze data through a user-friendly interface.

Highlights

  • Several large public repositories of microarray datasets and RNA-seq data are available

  • Studies have produced a huge amount of data stored in databases such as National center for biotechnology information (NCBI) Gene expression omnibus (GEO) or ArrayExpress

  • Each dataset is provided by a user, either by submitting it directly, or by importing it from other databases (i.e. Gene Expression Omnibus, GEO)

Read more

Summary

Introduction

Several large public repositories of microarray datasets and RNA-seq data are available. Submitted datasets are manually curated according to the MIAME (Minimum information about a Microarray experiment) standard [2] for Microarray, and MINSEQE (Minimum Information about a high-throughput nucleotide SEQuencing Experiment) standard for NGS. These information standards support the sharing and reuse of scientific data. The MIAME standard was introduced in 2001 to simplify storing and exchanging gene expression experiments The specifications of this standard requires the recording of all the information needed to unambiguously interpret the results of an experiment, and be able to reproduce the experiment. MINSEQE defines a standard to allow the unambiguous interpretation and reproducibility of sequencing experiments

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call