PurposeBootstrapping is a modern technique widely used in statistics to evaluate the performance of model parameters. The purpose of this study was to develop a strategy to identify and eliminate outliers in a dataset used for optimizing formula constants for lens power calculation.MethodsIn a dataset with N = 888 clinical cases treated with a monofocal aspherical intraocular lens (XC1/XY1, Hoya) constants for the SRKT, Haigis and Castrop formula were optimized and the prediction error PE calculated. The PE was bootstrapped NB = 1000 times, and the mean and trimmed mean of the bootstrapped PE were derived to generate the Bootlier plot showing the probability density function of the mean minus trimmed mean. With outliers this Bootlier plot shows some multimodality, and a Bootlier Index was extracted as a measure for multimodality. Outliers were removed from the tails of the PE distribution in a stepwise fashion until the Bootlier Index fell below a threshold of 0.001.ResultsWith the entire dataset the mean/SD/median/mean absolute/root mean squared PE using the optimized formula constants were -0.0045/0.44415/0.0134/0.3406/0.4412 dpt with SRKT, 0.0065/0.3711/-0.0056/0.2830/0.3710 dpt with Haigis, and 0.0034/0.3452/0.0023/0.2683/0.3451 dpt with the Castrop formula. After identifying and removing outliers the respective metrics for the PE were -0.0036/0.4028/0.0134/0.3205/0.4026 dpt for the SRKT (13 cases removed), 0.0050/0.3375/-0.0056/0.2656/0.3373 dpt with Haigis (11 cases removed), and 0.0035/0.3168/0.0023/0.2531/0.3166 dpt with Castrop (11 cases removed). The multimodality in the Bootlier plots was reduced from 0/0.1567/0.0587/0.0258/0.0007/0 with SRKT, 0/0.0981/0.0261/0.0202/0.0003/0 with Haigis, and 0.0006/0.0006/0.0161/0.0191/0.0005/0 with Castrop for the entire dataset to values below 1e-3 for trimming both tails of the PE distribution by ⅛, ¼, ½, 1, 2.5, and 5% respectively.ConclusionWe were able to prove that bootstrapping with outlier identification based on Bootlier plots and the Bootlier Index is a powerful tool to clean a dataset of outliers for formula constant optimization.
Read full abstract