Abstract

Introduction: Research based on observational designs often include examination of outcome assessments across sub-populations. The sub-populations can be small, thereby potentially losing representativeness of the target population. The exact, approximate, and propensity score matching are popular methods to address this issue but become inefficient for large data sets. Hypothesis: Data Nuggets will produce matching results that are similar to conventional methods while performing the matching orders of magnitude faster. Methods: Data Nuggets is a novel data reduction technique that preserves the structure of the data by creating a collection of representative data points and contains the information about the centers, scale and weight for each group represented by a nugget. Observational data and simulations were used to show that matching data nuggets instead of individual patients is more efficient due to using data that is orders of magnitude smaller than the original and corrects for bias. We tested a Data Nuggets matching algorithm in a perinatal database with over 350,000 records with varying number of nuggets (between 100 and 800) against a conventional matching and full model fitting. We fit models to examine the association of preeclampsia on gestational duration, after adjusting or matching on age, race and body mass-index, and infant sex. All variables were scaled before fitting the models. Results: Models with a few hundred data nuggets produced results similar to those using conventional matching. Estimates of coefficients predicting gestational age via pre-eclampsia ranged from -0.73 weeks (100 nuggets) to -0.71 weeks (1,200 nuggets) compared to -0.72 (SE = 0.01) in the approximate matching and -0.80 (SE = 0.01) in the full (unmatched) data model (figure). Conclusions: Data Nuggets matching achieved results similar to those produced by the conventional matching method and is preferable for large data sets because of speed and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call