Abstract Objectives Reference intervals (RI) play a decisive role in the interpretation of medical laboratory results. An important step in the determination of RI is age- and sex specific partitioning, which is usually based on an empirical approach by graphical representation. In this study, we evaluate an automated machine learning approach. Methods This study uses pediatric data from the CALIPER RI (Canadian Laboratory Initiative on Pediatric Reference Intervals) study. The calculation of potential partitions is carried out using a regression tree model included in the rpart package of the statistical programming language R. The Harris & Boyd method is used to compare the corresponding partitions suggested by rpart and CALIPER. For better comparability, the reference ranges of the partitions of both approaches are then calculated using reflimR. Results Most of the partitions suggested by rpart or CALIPER show sufficient heterogeneity among themselves to justify age- and/or sex-specific RI partitioning. With only few individual exceptions, both methods yield comparable results. The partitions of both approaches for albumin and γ-glutamyltransferase are very similar to each other. For creatinine rpart suggests a slightly earlier distinction between the sexes. Alkaline phosphatase shows the most pronounced differences. In addition to a considerable earlier sex split, rpart suggests different age intervals for both sexes, resulting in three partitions for females and four partitions for males. Conclusions Our findings indicate that the automated analysis provided by rpart yields results that comparable to traditional methods. Nevertheless, the medical plausibility of the automatic suggestions needs to be validated by human experts.
Read full abstract