Abstract As the availability of genetic and genomic data and analysis tools from large-scale cancer initiatives continues to increase, with single-cell studies adding new dimensions to the potential scientific insights, the need has become more urgent for a software environment that supports the rapid pace of cancer data science. The electronic analysis notebook has recently emerged as an effective and versatile tool for this purpose, allowing scientists to combine the scientific exposition – text, images, and multimedia – with the actual code that runs the analysis, creating a single “research narrative” document. The Jupyter Notebook system has become the de facto standard notebook environment in data science and genomic analysis. However, the Jupyter environment requires familiarity with a programming language to run analyses, and even text must be formatted using a programming-style language. To extend notebook capabilities to the needs of researchers at all levels of programming expertise, we developed the GenePattern Notebook environment, which integrates Jupyter's capabilities with the hundreds of genomic tools available through the GenePattern platform. This tool allows scientists to develop, share, collaborate on, and publish their notebooks, requiring only a web browser. In this environment, investigators can design their in-silico experiments, perform and refine analyses, launch compute-intensive analyses on cloud-based and high-performance compute resources, and publish their results as electronic notebooks that other scientists can adopt to reproduce the original analyses and modify for their own work. GenePattern Notebook provides: (1) Access to a wide range of genomic analyses within a notebook. Hundreds of analyses are available, from machine learning techniques such as clustering, classification, and dimension reduction, to omic-specific methods for gene expression analysis, proteomics, flow cytometry, sequence variation analysis, pathway analysis, and others. (2) A library of featured genomic analysis notebooks is provided. These include templates for common analysis tasks as well as cancer-specific research scenarios and compute-intensive methods. Scientists can easily copy these notebooks, use them as is, or adapt them for their research purposes. (3) Notebook enhancements. A rich text editor allows scientists to enter and format text as they would in a word processor. A user interface-building tool allows notebook developers to wrap their code so it is displayed as a web form, with only the necessary inputs exposed. Users of the notebook are presented with a simplified display that allows them to run the analyses without needing to interact with the code behind them. (4) Publication and collaborative editing. To make a notebook public, an author selects the “publish” feature and adds descriptive information. The notebook is then made available on the community section of the notebook workspace. An author can include a web link to a public notebook in a publication, and users who follow the link will see a read-only version of the notebook, with the option to log in to the workspace, where they can run, copy, and edit their own version. The GenePattern Notebook environment is freely available at http://genepattern-notebook.org. Citation Format: Michael M. Reich, Thorin Tabor, Ted Liefeld, Edwin Juarez, Barbara Hill, Helga Thorvaldsdottir, Pablo Tamayo, Jill P. Mesirov. GenePattern Notebook: An integrative analytical environment for cancer research [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 3207.
Read full abstract