Pilot Dataset Research Articles

Abstract Background: The High Throughput Truthing (HTT) project is assessing pathologist agreement estimates of stromal tumor-infiltrating lymphocytes (sTILs) density in hematoxylin and eosin (H&E) stained breast cancer biopsy slides. The HTT project will create a validation dataset for artificial intelligence and machine learning (AI/ML) algorithms in digital pathology fit for a training, proficiency testing, and regulatory purpose. Methods: The pilot study crowdsourced pathologists to estimate sTIL density in 640 regions of interest (ROIs) across 64 slides via two modalities: an optical microscope (eeDAP) and two digital platforms (caMicroscope and PathPresenter). eeDAP is a hardware-software interface that presents the observer with pre-defined fields of view on H&E slides that corresponds to the ROI on a whole slide image. The PathPresenter and caMicroscope web-applications replicate the eeDAP workflow on the whole slide image without microscope hardware. In the workflow, pathologists evaluated the eligibility of an ROI for sTILs content then estimated the densities of tumor-associated stroma and sTILs in the ROI. Inter-pathologist agreement within ROIs was characterized with the root mean-squared difference. Using 72 of the highest variability ROIs selected from the pilot study, seven practicing pathologists participated in a subsequent focus group to improve the clinical training and data-collection workflows. Results: The pilot study collected 7,373 sTIL density estimates from 35 pathologists between February 2020 and May 2021. The focus group provided an additional 411 evaluations on 72 ROIs and in-depth discussions to identify pitfalls, gaps in training, and workflow improvements. Installation of eeDAP for physical data collection guided improvements in documentation and operation capabilities. Updated training materials refine the definition of tumor-associated stroma, provide reference images to differentiate sTILs from other cell types, and provide feedback during training. Digital and microscope platforms benefitted from enforcing registration and training, standardizing workflows, and accelerating eeDAP slide-image registration. Conclusions: The slides, images, and annotations provided by volunteer collaborators and participants for our pilot study led to improvements in data collection tools and crowdsourcing workflows to ensure consistency and minimize annotation variability. Our pilot dataset and analysis methods are available on a public HTT Github repository to allow open access to our methodology and feedback from the digital pathology and statistics communities. These data-collection and analysis methods are applicable to other quantitative biomarkers for validation of AI/ML algorithms. The lessons learned from this work will be applied to the HTT pivotal study and inform future quality data-collection methods of pathologist annotations. Citation Format: Katherine N. Elfer, Kim Blenman, Sarah N. Dudgeon, Victor Garcia, Anna Ehinger, Xiaoxian Li, Amy Ly, Dieter Peeters, Bruce Werness, Matthew Hanna, Roberto Salgado. Tools for collecting pathologist annotations and understanding interobserver variability [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 460.

Read full abstract

Abstract. We use optimal estimation (OE) to quantify methane fluxes based on total column CH4 data from the Greenhouse Gases Observing Satellite (GOSAT) and the GEOS-Chem global chemistry transport model. We then project these fluxes to emissions by sector at 1∘ resolution and then to each country using a new Bayesian algorithm that accounts for prior and posterior uncertainties in the methane emissions. These estimates are intended as a pilot dataset for the global stock take in support of the Paris Agreement. However, differences between the emissions reported here and widely used bottom-up inventories should be used as a starting point for further research because of potential systematic errors of these satellite-based emissions estimates. We find that agricultural and waste emissions are ∼ 263 ± 24 Tg CH4 yr−1, anthropogenic fossil emissions are 82 ± 12 Tg CH4 yr−1, and natural wetland/aquatic emissions are 180 ± 10 Tg CH4 yr−1. These estimates are consistent with previous inversions based on GOSAT data and the GEOS-Chem model. In addition, anthropogenic fossil estimates are consistent with those reported to the United Nations Framework Convention on Climate Change (80.4 Tg CH4 yr−1 for 2019). Alternative priors can be easily tested with our new Bayesian approach (also known as prior swapping) to determine their impact on posterior emissions estimates. We use this approach by swapping to priors that include much larger aquatic emissions and fossil emissions (based on isotopic evidence) and find little impact on our posterior fluxes. This indicates that these alternative inventories are inconsistent with our remote sensing estimates and also that the posteriors reported here are due to the observing and flux inversion system and not uncertainties in the prior inventories. We find that total emissions for approximately 57 countries can be resolved with this observing system based on the degrees-of-freedom for signal metric (DOFS > 1.0) that can be calculated with our Bayesian flux estimation approach. Below a DOFS of 0.5, estimates for country total emissions are more weighted to our choice of prior inventories. The top five emitting countries (Brazil, China, India, Russia, USA) emit about half of the global anthropogenic budget, similar to our choice of prior emissions but with the posterior emissions shifted towards the agricultural sector and less towards fossil emissions, consistent with our global posterior results. Our results suggest remote-sensing-based estimates of methane emissions can be substantially different (although within uncertainty) than bottom-up inventories, isotopic evidence, or estimates based on sparse in situ data, indicating a need for further studies reconciling these different approaches for quantifying the methane budget. Higher-resolution fluxes calculated from upcoming satellite or aircraft data such as the Tropospheric Monitoring Instrument (TROPOMI) and those in formulation such as the Copernicus CO2M, MethaneSat, or Carbon Mapper can be incorporated into our Bayesian estimation framework for the purpose of reducing uncertainty and improving the spatial resolution and sectoral attribution of subsequent methane emissions estimates.

Read full abstract

Pilot Dataset Research Articles

Articles published on Pilot Dataset

An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction.

Evaluating response-adaptive randomization procedures for recurrent events and terminal event data using a composite endpoint.

Abstract 460: Tools for collecting pathologist annotations and understanding interobserver variability

The 2019 methane budget and uncertainties at 1° resolution and each country through Bayesian integration Of GOSAT total column methane data and a priori inventory estimates

Multisite MRI reproducibility of lateral ventricular volume using the NAIMS cooperative pilot dataset.

Risk assessment of potentially toxic trace elements via consumption of dairy products sold in the city of Yerevan, Armenia

A Multidimensional Bioinformatic Platform for the Study of Human Response to Surgery.

FAIR Data Reuse in Traumatic Brain Injury: Exploring Inflammation and Age as Moderators of Recovery in the TRACK-TBI Pilot.

A multidimensional stability framework enhances interpretation and comparison of carbon cycling response to disturbance

Semantic segmentation of gonio-photographs via adaptive ROI localisation and uncertainty estimation

Correcting for Superficial Bias in 7T Gradient Echo fMRI.

Operational optimization of closed-circuit reverse osmosis (CCRO) pilot to recover concentrate at an advanced water purification facility for potable reuse

Advances in surface-wave tomography for near-surface applications

Early Detection of Health Changes in the Elderly Using In-Home Multi-Sensor Data Streams

A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets.

Synthetic single cell RNA sequencing data from small pilot studies using deep generative models

Su kayıp yönetimi için temel hesaplama araçlarının geliştirilmesi ve temel su kayıp bileşenlerinin analizi

Region-of-Interest-Based Cardiac Image Segmentation with Deep Learning

SSP: an R package to estimate sampling effort in studies of ecological communities

Development of a Comparison Framework for Evaluating Environmental Contours of Extreme Sea States

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pilot Dataset Research Articles

Articles published on Pilot Dataset

An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction.

Evaluating response-adaptive randomization procedures for recurrent events and terminal event data using a composite endpoint.

Abstract 460: Tools for collecting pathologist annotations and understanding interobserver variability

The 2019 methane budget and uncertainties at 1° resolution and each country through Bayesian integration Of GOSAT total column methane data and a priori inventory estimates

Multisite MRI reproducibility of lateral ventricular volume using the NAIMS cooperative pilot dataset.

Risk assessment of potentially toxic trace elements via consumption of dairy products sold in the city of Yerevan, Armenia

A Multidimensional Bioinformatic Platform for the Study of Human Response to Surgery.

FAIR Data Reuse in Traumatic Brain Injury: Exploring Inflammation and Age as Moderators of Recovery in the TRACK-TBI Pilot.

A multidimensional stability framework enhances interpretation and comparison of carbon cycling response to disturbance

Semantic segmentation of gonio-photographs via adaptive ROI localisation and uncertainty estimation

Correcting for Superficial Bias in 7T Gradient Echo fMRI.

Operational optimization of closed-circuit reverse osmosis (CCRO) pilot to recover concentrate at an advanced water purification facility for potable reuse

Advances in surface-wave tomography for near-surface applications

Early Detection of Health Changes in the Elderly Using In-Home Multi-Sensor Data Streams

A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets.

Synthetic single cell RNA sequencing data from small pilot studies using deep generative models

Su kayıp yönetimi için temel hesaplama araçlarının geliştirilmesi ve temel su kayıp bileşenlerinin analizi

Region-of-Interest-Based Cardiac Image Segmentation with Deep Learning

SSP: an R package to estimate sampling effort in studies of ecological communities

Development of a Comparison Framework for Evaluating Environmental Contours of Extreme Sea States