Abstract Genome profiling represents a critical pillar for clinical, translational, and basic research studies. Hospitals, core facilities, and research enterprises invest significant resources to generate genomic data sets. Yet, data management and analysis is frequently manual, which demands significant operator time and often results in siloed resources rendering them as single-use assets. Centralization of the genomic capital in a framework that enables automated processing, metadata integration and continuous interrogation maximizes return for investment and serves as the critical catalyst for research innovation, clinical translation and reproducible research. We developed Isabl, a plug-and-play infrastructure for scalable bioinformatics operations. Isabl provides solutions for databasing, assets management, tracking, automated and reproducible data processing. Dynamic reporting and meta-analysis across data assets is enabled. Isabl is built on four main components. First, an individual-centric and extensible relational database with tracking support for samples (temporal, spatial, aliquot), experimental data (assays, platforms, sequencing runs), cohorts (clinical trials, research projects) and versioned bioinformatics applications (assembly aware, tools, results). Second, the database is exposed through a fully featured RESTful API that enables horizontal integration with information systems such as sequencing cores LIMS, variant visualization platforms like cBioPortal, and where applicable, clinical and biospecimen institutional databases. Third, a Software Development Kit (SDK) built for Next Generation Sequencing assets management. The SDK enables automated execution of data import and language-agnostic bioinformatics applications (alignment, variant calling, post-processing) with support for cohort and individual level reporting features. Furthermore, the SDK facilitates dynamic retrieval of results using vertical and horizontal queries (individual and cohort level, respectively). Lastly, Isabl comes with a Single Page Web Application that fosters user interaction with multidisciplinary teams (i.e. researchers, project coordinators, engineers, clinicians) facilitating tracking of analyses, results visualization, and dynamic query processing. Isabl is currently supporting the Memorial Sloan Kettering Genome Pediatrics Precision Medicine Initiative, a prototype platform that delivers integrated, real-time automated reporting of clinical targeted gene re-sequencing, research whole genome and transcriptome profiling data; as well as linked data from pre-clinical models (i.e. PDX) and single cells studies. As an open-source tool, Isabl democratizes access to a purpose built, automated, scalable and fully integrable bioinformatics architecture. Isabl will be available at https://github.com/isabl-io. Citation Format: Juan S. Medina-Martínez, Juan E. Arango-Ossa, Gunes Gundem, Max F. Levine, Minal Patel, Noushin R. Farnoud, Venkata D. Yellapantula, Gao Teng, Joseph G. Mccarter, Elsa Bernard, Franck Rapaport, Dominik Glodzik, Ross L. Levine, Andrew Kung, Elli Papaemmanuil. A plug-and-play infrastructure for scalable bioinformatics operations [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5105.
Read full abstract