Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

Kimberly T To,David M Reif,Rebecca C Fry

doi:10.1186/s13040-018-0169-5

Kimberly T To, David M Reif + Show 1 more

Open Access

https://doi.org/10.1186/s13040-018-0169-5

Copy DOI

Abstract

BackgroundThe Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources (“assays”), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry’s (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources.ResultsOur simulations explored a wide range of scenarios concerning data (0-80% assay data missing per chemical), modeling (ToxPi models containing from 160-700 different assays), and imputation method (k-Nearest-Neighbor, Max, Mean, Min, Binomial, Local Least Squares, and Singular Value Decomposition). We find that most imputation methods result in significant changes to ToxPi score, except for datasets with a small number of assays. If we consider rank change conditional on these significant changes to ToxPi score, we find that ranks of chemicals in the minimum value imputation, SVD imputation, and kNN imputation sets are more sensitive to the score changes.ConclusionsWe found that the choice of imputation strategy exerted significant influence over both scores and associated ranks, and the most sensitive scenarios were those involving fewer assays plus higher proportions of missing data. By characterizing the effects of missing data and the relative benefit of imputation approaches across real-world data scenarios, we can augment confidence in the robustness of decisions regarding the health and ecological effects of environmental chemicals

Highlights

The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources
Minimum value imputation scores appear to be more sensitive in cases were there are a fewer number of assays, whereas mean and k-nearest neighbors (kNN) imputation only showed nonsignificance in the smallest dataset (5 slices, 1 assay per slice)
Because simulated datasets are generated from a diverse number of randomly sampled assays from the original dataset, variability is expected between chemical ToxPi scores from the minimium value imputed simulated datasets and scores from the standardly imputed original dataset

Summary

Introduction

The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry’s (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources. Given realistic limitations on time and resources for testing, methods for prioritizing and profiling the risk-relevant activity (both observed and predicted) of chemicals are needed for diverse application areas. The Agency for Toxic Substances and Disease Registry (ATSDR) was established to “effectuate and implement the health related authorities” of the Superfund Act. The Superfund Amendments and Reauthorization Act of 1986 requires that ATSDR release a list of chemicals commonly found at Superfund sites listed on the National Priorities list, prioritized for further study

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Jun 13, 2018
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

A Modified Local Least Squares-Based Missing Value Estimation Method in Microarray Gene Expression Data
Shilpi Bose ... Samiran Chattopadhyay
-
Shilpi Bose, et. al.Shilpi Bose ... Samiran Chattopadhyay
01 Dec 2013
01 Dec 2013

A weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data
Wai Ki Ching ... Ching Wan Tai
International Journal of Data Mining and Bioinformatics | VOL. 4
Wai Ki Ching, et. al.Wai Ki Ching ... Ching Wan Tai
01 Jan 2009
International Journal of Data Mining and Bioinformatics | VOL. 4

A New Heuristic Approach for Treating Missing Value: ABCimp
Pinar Cihan ... Zeynep Banu Ozger
Elektronika ir Elektrotechnika | VOL. 25
Pinar Cihan, et. al.Pinar Cihan ... Zeynep Banu Ozger
06 Dec 2019
Elektronika ir Elektrotechnika | VOL. 25

Imputation of Missing Values in Economic and Financial Time Series Data Using Five Principal Component Analysis (PCA) Approaches
Chisimkwuo John ... Emmanuel J Ekpenyong
Central Bank of Nigeria Journal of Applied Statistics | VOL. 10
Chisimkwuo John, et. al.Chisimkwuo John ... Emmanuel J Ekpenyong
27 Aug 2019
Central Bank of Nigeria Journal of Applied Statistics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Characterizing the effects of missing data and evaluating imputation methods for chemical prioritization applications using ToxPi

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining