Intra- and interobserver agreement with regard to describing adnexal masses using International Ovarian Tumor Analysis terminology: reproducibility study involving seven observers

L Zannoni,A C Testa,L Savelli,L Valentin,P Sladkevicius,A Di Legge,L Jokubkiene,G Condous

doi:10.1002/uog.13273

Abstract

To estimate intraobserver repeatability and interobserver agreement in assessing the presence of papillary projections in adnexal masses and in classifying adnexal masses using the International Ovarian Tumor Analysis terminology for ultrasound examiners with different levels of experience. We also aimed to identify ultrasound findings that cause confusion and might be interpreted differently by different observers, and to determine if repeatability and agreement change after consensus has been reached on how to interpret 'problematic' ultrasound images. Digital clips (two to eight clips per adnexal mass) with gray-scale and color/power Doppler information of 83 adnexal masses in 80 patients were evaluated independently four times, twice before and twice after a consensus meeting, by four experienced and three less experienced ultrasound observers. The variables analyzed were tumor type (unilocular, unilocular solid, multilocular, multilocular solid, solid) and presence of papillary projections. Intraobserver repeatability was evaluated for each observer (percentage agreement, Cohen's kappa). Interobserver agreement was estimated for all seven observers (percentage agreement, Fleiss kappa, Cohen's kappa). There was uncertainty about how to define a solid component and a papillary projection, but consensus was reached at the consensus meeting. Interobserver agreement for tumor type was good both before and after the consensus meeting, with no clear improvement after the meeting, mean percentage agreement being 76.0% (Fleiss kappa, 0.695) before the meeting and 75.4% (Fleiss kappa, 0.682) after the meeting. Interobserver agreement with regard to papillary projections was moderate both before and after the consensus meeting, with no clear improvement after the meeting, mean percentage agreement being 86.6% (Fleiss kappa, 0.536) before the meeting and 82.7% (Fleiss kappa, 0.487) after it. There was substantial variability in pairwise agreement for papillary projections (Cohen's kappa, 0.148-0.787). Intraobserver repeatability with regard to tumor type was very good and similar before and after the consensus meeting (agreement 87-95%, kappa, 0.83-0.94). With regard to papillary projections intraobserver repeatability was good or very good both before and after the consensus meeting (agreement 88-100%, kappa, 0.64-1.0). Despite uncertainty about how to define solid components, interobserver agreement was good for tumor type. The interobserver agreement for papillary projection was moderate but very variable between observer pairs. The term 'papillary projection' might need a more precise definition. The consensus meeting did not change inter- or intraobserver agreement.

Full Text