Abstract The Human Tumor Atlas Network (HTAN) is a National Cancer Institute (NCI) Cancer Moonshot Initiative to generate three-dimensional molecular and spatial atlases of diverse human tumors and characterize crucial transitions in cancer progression and treatment. A series of manuscripts describing an extensive array of genomics, transcriptomics, proteomics, and imaging datasets are emerging but careful curation is required to maximize the data’s utility. HTAN centers have generated data from over 30 assay types including more than a dozen imaging methods (e.g., fluorescence microscopy, metal-tagged imaging, digital pathology, spatial transcriptomics, and electron microscopy). The HTAN Data Coordinating Center (DCC) develops infrastructure and tools to ingest, curate, explore, and share these data in a findable, accessible, interoperable, and reusable (FAIR) manner. As of November 2021 the DCC had ingested over 150 TB of data, including nearly 150,000 imaging data files. HTAN metadata schemas were developed through a community-driven Request for Comments process, including schemas for raw, processed and QC-checked imaging data, segmentation masks and feature arrays. This work led to a set of Minimum Information for Highly Multiplexed Tissue Imaging (MITI) guidelines, which are complemented by detailed metadata on participants and biospecimens. The Schematic python package is used to provide a user-friendly interface for defining data-model schemas, generate (meta)data submission spreadsheets, and asset-store interfaces on various cloud platforms. Across assay types, the HTAN schema currently encompasses over 700 attributes from 35 components. The DCC manages the HTAN Data Portal, which provides community access to the atlases and enables filtering of released atlas data based on metadata fields, and pointers to data availability. To enable visualization and exploration of imaging data directly from the portal, centers can submit Minerva stories, which provide guided storytelling of multiplexed tissue images. We provide methods to automatically generate default stories with visually appropriate overlays of sequential four-channel groups. We developed Miniature to generate unsupervised and interpretable multiplexed image thumbnails to provide rapid contextual information when browsing image collections. Dimensionality reduction is used to reduce a multiplexed image pyramid to three dimensions. Pixels are recolored according to their coordinates in low-dimensional space, where Pixels with similar marker expression are assigned similar colours. Public data sharing requires careful consideration of data egress costs and de-identification of images and their metadata. The DCC is working closely with NCI Cancer Research Data Commons to ensure long term-legacy and reuse of HTAN data. Citation Format: Adam J. Taylor, Milen Nikolov, Ino de Brujin, Jeremy Muhlich, Mialy De Felice, Artem Sokolov, Denis Schapiro, Peter Sorger, Julie Bletz, Nikolaus Schultz, Vésteinn Thorsson, James Eddy, Ethan Cerami. Curating cartography: Enabling the harmonisation, visualisation, and reuse of diverse multiplexed imaging data through the Human Tumor Atlas Network Data Coordinating Center [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 2131.
Read full abstract