Abstract The successful application of Next Generation Sequencing (NGS) to drug discovery requires systems to manage and document each step of the sequencing process from sample receipt through data generation and data processing. We combined BenchlingTM, a solution for tracking NGS lab processes, with FONDA (Framework Of Next generation sequencing Data Analysis) an internally developed data processing platform, to support multiple types of NGS data generation and processing. Benchling combines a digital notebook and a laboratory information management system (LIMS). The system documents and automates steps in the NGS process including: sample registration, nucleic acid extraction, library construction, flow cell construction, sequencer sample sheet generation and BCL2FASTQ conversion. This enables wet lab scientists to easily retrieve an appropriate protocol for each sample and sequencing library type. We connected our sequencers to Benchling in order to monitor each sequencing run and to keep track of the quality of NGS data. In addition, it generates “analysis ready sample sheet” (contains project and study information, location of FASTQ, sample species and library type) and uploads it into designated S3 buckets for data processing. Benchling dashboards provide overviews of NGS sample preparation, data generation and quality control. In summary, Benchling interconnects the original sample, the labels, the barcodes, the cDNA/DNA, the library, and all the QC results. We process NGS data using pipelines implemented in FONDA on a dockerized Amazon Web Services cloud platform. Analyses can be configured automatically from information exported by Benchling or launched manually. After data processing is completed, output files such as gene expression counts or variant calls are deposited into project-specific folders, ready for secondary analysis. In the current FONDA version (as of Nov 2020), we have developed pipelines for single cell multi-omics (CITE-seq and single-cell immune profiling) and bulk RNA-seq. The modular design of FONDA facilitates the development, the updating, and the extension of pipelines to new sequencing technologies. In summary, Benchling and FONDA enable high quality sample and NGS data flows from the lab for target identification, understanding mechanism of action, patient stratification and biomarker discovery. Availability and implementation: FONDA is implemented in Java and released under the Apache License 2.0. FONDA can be downloaded from GitHub at https://github.com/epam/fonda. Citation Format: Chandra Sekhar Pedamallu, Joon Sang Lee, Shu Yan, Adalis Maisonet, Aleksandr Sidoruk, Tengui Chen, Yulia Kamyshova, Mariia Zueva, Mark Magid, Quan Wan, Jeffrey Thompson, Valerie Zebrouck, Immanuel Gadaczek, Mikhail Alperovich, Brian McNatt, Alexei Protopopov, Donald Jackson, Jack Pollard. A comprehensive sample tracking and data processing workflow for next generation sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 2280.
Read full abstract