Abstract As the availability of genomic data and analysis tools from large-scale cancer initiatives continues to increase, with single-cell studies adding new dimensions to the potential scientific insights, the need has become more urgent for a software environment that supports the rapid pace of cancer data science. The electronic analysis notebook has recently emerged as a versatile tool for this purpose, allowing scientists to combine the scientific exposition with the code that runs the analysis, creating a single “research narrative” document. The Jupyter Notebook system has become the de facto standard notebook environment in data science and genomic analysis. However, the Jupyter environment requires familiarity with programming to run analyses, and even text must be formatted using a programming-style language.To extend notebook capabilities to researchers at all levels of programming expertise, we developed the GenePattern Notebook environment, which integrates Jupyter with the hundreds of genomic tools available through the GenePattern platform. This tool allows scientists to develop, share, collaborate on, and publish their notebooks, requiring only a web browser. Investigators can design their in-silico experiments, refine workflows, launch compute-intensive analyses on cloud-based and high-performance compute resources, and publish results that others can adopt to reproduce the original analyses and modify for their own work.GenePattern Notebook provides: (1) Access to a wide range of genomic analyses within a notebook. Hundreds of analyses are available, from machine learning techniques such as clustering, classification, and dimension reduction, to omic-specific methods for gene expression analysis, proteomics, flow cytometry, sequence variation analysis, pathway analysis, and others. (2) A library of featured genomic analysis notebooks, including templates for common analysis tasks as well as cancer-specific research scenarios and compute-intensive methods. Scientists can easily copy these notebooks, use them as is, or adapt them for their research purposes. To support the growing role of single-cell analysis, we have recently released single-cell RNA-seq preprocessing, cluster harmonization, pseudotime, and RNA velocity notebooks. (3) Notebook enhancements. A rich text editor allows scientists to format text as they would in a word processor. A user interface-building tool allows notebook developers to wrap code so it is displayed as a web form, with only the necessary inputs exposed. (4) Publication and collaborative editing. Authors can publish their notebooks, where they are then made available on the community section of the notebook workspace, where other scientists may copy, run, and edit their own version.The GenePattern Notebook environment is freely available at http://genepattern-notebook.org. Citation Format: Michael M. Reich, Thorin Tabor, Edwin Juarez, Alexander Wenzel, Ted Liefeld, Barbara Hill, David Eby, Forrest Kim, Helga Thorvaldsdóttir, Pablo Tamayo, Jill P. Mesirov. GenePattern Notebook: An integrative analytical environment for cancer research [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1903.
Read full abstract