Abstract
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
Highlights
The Library of Integrated Network-based Cellular Signatures (LINCS) project[1] is a multi-center NIH-funded program that is creating a comprehensive library of molecular signatures describing the effect of various perturbagens on normal cellular functions, while developing data integration, modeling and analysis methodologies
The Data Coordination and Integration Center (DCIC), in close collaboration with the Data Signature Generation Centers (DSGCs), developed four categories of metadata specifications to capture essential attributes of the biological experiments, namely the critical reagents used (Reagent Metadata Specifications), experimental conditions (Experimental Metadata Specifications), reagent-independent assay parameters (Assay Metadata Specifications), and important dataset annotations (Dataset Metadata Specifications). These metadata categories are further divided into sub-categories and describe LINCS data generation, assays, reagents, and resulting datasets in detail (Fig. 1). The development of those standards followed the same process as in Phase I of the LINCS Project[10]: a Data Working Group (DWG) was formed with members from the DCIC and the DSGCs, and metadata use cases were created across all Centers
To develop a comprehensive set of reagent metadata, the existing 6 metadata categories from Phase 1 were expanded into a total of 11 categories. These 11 key categories (Small molecules, Cell lines, Primary cells, Embryonic stem cells, Differentiated cells, induced pluripotency stem cells (iPSCs), Nucleic acids reagents, Proteins, Antibody reagents, Unclassified perturbagens, and Other reagents) primarily include the perturbation-type model systems used in the LINCS assays, and include reagents to detect analytes and quantified molecular changes
Summary
The Library of Integrated Network-based Cellular Signatures (LINCS) project[1] is a multi-center NIH-funded program that is creating a comprehensive library of molecular signatures describing the effect of various perturbagens (e.g. small molecules, shRNAs, antibodies) on normal cellular functions, while developing data integration, modeling and analysis methodologies. This extensive reference library of cellular responses is critical for understanding complex human diseases such as cancer, and could be further utilized to uncover new approaches for their treatment. : 1. The Broad Institute's LINCS Center for Transcriptomics (BroadT LINCS) is employing the L1000 assay[6] to measure the effect of more than 25,000 chemical and genetic perturbations at the transcription level in over 50 human cell lines
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.