The aim of the study was to evaluate the influence of the number of surfaces (N<sub>surf</sub>) and the number of observers (N<sub>obs</sub>) on the statistical power of a study comparing the diagnostic accuracies of radiographic systems used for approximal caries lesion detection. A data set consisting of 338 surfaces examined by 10 independent observers using four radiographic systems was available. The presence of a caries lesion was assessed from a 5-point confidence scale. The true lesion diagnosis was established by histological validation. ROC curve areas (A<sub>z</sub>s) were used to express the diagnostic accuracy of the observers with the radiographic systems. Assuming that the A<sub>z</sub>s were tested by a two-way analysis of variance, we performed a simulation study in order to evaluate how the power of this statistical analysis depended on N<sub>surf</sub> and N<sub>obs</sub>. As a measure of the statistical power we used the standard error of the difference between the expected A<sub>z</sub>s of two systems. The simulations were made with N<sub>surf</sub> in the range from 25 to 338 and N<sub>obs</sub>from 2 to 10. The simulations showed that the power increased as a function of the total number of evaluations per system (N<sub>surf</sub>× N<sub>obs</sub>), but how this number was attained in relation to the number of surfaces and observers had only marginal influence on the power. Thus, from a statistical point of view it may be concluded, provided that data are analyzed by a two-way analysis of variance, that study designs for comparing the accuracy of several systems can be composed freely in relation to the number of surfaces and observers as long as the total number of evaluations per system are identical.
Read full abstract