SMITH: a LIMS for handling next-generation sequencing workflows.

Heiko Muller,Arnaud Ceol,Yuriy Vaskin,Francesco Venco

doi:10.1186/1471-2105-15-s14-s3

Heiko Muller, Arnaud Ceol + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-15-s14-s3

Copy DOI

Journal: BMC bioinformatics	Publication Date: Nov 27, 2014
Citations: 26	License type: cc-by

Affiliation: Politecnico di Milano

Abstract

BackgroundLife-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges.An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed.In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling).MethodsSMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses.ResultsSMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc.ConclusionsSMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.

Highlights

Life-science laboratories make increasing use of Generation Sequencing (NGS) for studying biomacromolecules and their interactions
The Model is composed of Java Server Faces (JSF) Managed Beans that communicate with the information system tier that relies on the Hibernate object/relational mapping [21] to communicate with a MySQL database
Infrastructure around SMITH at the Center for Genomic Science Before presenting SMITH in detail, we briefly describe the infrastructure that SMITH operates in (Figure 1A)

Summary

Introduction

Life-science laboratories make increasing use of Generation Sequencing (NGS) for studying biomacromolecules and their interactions. A sequencing facility is confronted with multiple problems It must handle sequencing requests, process the samples according to the application specified, combine multiplexed samples to be run on the same lane such that de-multiplexing is not compromised and track the state of the sample while it is passing through the sequencing pipeline. They must ensure quality, keep track of reagent barcodes used for each sample, deliver the results to the proper user following de-multiplexing, archive the results and support the users when troubleshooting becomes necessary. Considering the central importance of sequencing data, a sequencing facility has to meet these demands under constant pressure to produce results as quickly as possible

Methods

Results

Conclusion