Abstract

We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and Boolean expressions for categorical features. The proposed Conjecturing framework recovers known nonlinear and Boolean relationships among features from data. In both settings, true underlying relationships are revealed. We then compare the method to a previously proposed framework for symbolic regression on the ability to recover equations that are satisfied among features in a data set. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors that are confirmed in the medical literature. Discovering patterns in data is a first step toward establishing causal relationships, which can be the basis for effective decision making. Data Ethics & Reproducibility Note: Code and data to reproduce results are available here: https://github.com/jpbrooks/conjecturing . COVID-19 synthetic patient data were obtained as part of the Veterans Health Administration (VHA) Innovation Ecosystem and precisionFDA COVID-19 Risk Factor Modeling Challenge and are used here with permission from the Food and Drug Administration (FDA). The e-companion is available at https://doi.org/10.1287/ijds.2021.0043 . History: Olivia Sheng served as the senior editor for this article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call