Abstract

Thyroid nodules have a prevalence of 19% to 68% in the general population, with only 7% to 15% harboring thyroid cancer.1, 2 Ultrasound (US) is now the primary radiologic tool for evaluating thyroid nodules, with several US features being predictive of malignancy-risk and, therefore, able to guide when a fine needle aspiration (FNA) biopsy may be advisable. Multiple medical specialty societies across the world have developed US-based risk stratification guidelines for thyroid nodules, in part to curb the significant testing and possible overtreatment of millions of benign thyroid nodules. A key design feature of the guidelines is to identify nodules with low risk of malignancy whose cytologic assessment with FNA can be safely deferred.2, 3 Differences among the guidelines include the US lexicon, risk stratification category, quantitative versus qualitative grouping, and nodule size thresholds for FNA.1-5 Few studies, however, have compared the different guidelines to determine, which is best at identifying high-risk nodules or thyroid cancers while minimizing the number of “unnecessary” thyroid biopsies. We chose to compare studies that assessed large cohorts of thyroid nodules with known outcomes, had a blinded review of the US, and then applied various classification systems to see which performed best. Within these investigations, “unnecessary” thyroid biopsies are defined as biopsies, which would have been indicated by the particular classification system and ultimately turned out to be benign, either by cytology or histology. In this review, we focused on the negative predictive value (NPV) of the guidelines, representing the probability that a nodule is benign. We also looked at the rate of unnecessary FNAs and the probability that a nodule not selected for FNA was malignant, the false negative rate (FNR). We sought to clarify, which classification performed best with respect to the NPV, unnecessary biopsy rate, and FNR. This would indicate that the system was superior at identifying those nodules that did not require biopsy while simultaneously avoiding misclassifying malignancies as benign. Since most guidelines have a size threshold of 1 cm for FNA, this review only applied to thyroid nodules ≥1 cm. Five articles recently published were selected for review (Table S1). The nodules included in the studies were designated as benign or malignant based on one or multiple biopsies (FNA or core biopsy) or surgical pathology. Yoon et al. retrospectively compared the performance of the Kwak Thyroid Image Reporting and Data System (K-TIRADS), American College of Radiology Thyroid Image Reporting and Data System (ACR-TIRADS, see Table I for details of this rating system), and European Thyroid Association Thyroid Image Reporting and Data System (EU-TIRADS) in the diagnosis of 2274 thyroid nodules ≥1 cm. They found similar NPVs between those three guidelines (97.3%, 95.2%, and 94.7%, respectively). However, the percentage of unnecessary FNAs was much lower in the ACR-TIRADS guidelines (28% in ACR-TIRADS vs. 66.3% for K-TIRADS and 52.7% for EU-TIRADS).4 One can theoretically increase the NPV by lowering the threshold to perform FNA, although that would lead to an increase in unnecessary biopsies. Gao et al. retrospectively compared the performance of K-TIRADS, ACR-TIRADS, and the American Thyroid Association (ATA) guidelines in the diagnosis of 1427 surgically resected thyroid nodules >1 cm. The ACR- and K-TIRADS had lower NPVs than the ATA guidelines (82.5%, 86.5%, and 94.5%, respectively), but slightly higher PPVs. However, this study was unique, and unlike an individual surgeons practice, in that the malignancy rate was 66%, which would alter the performance of all diagnostic modalities.1 Grani et al. evaluated the performance of five guidelines in the diagnosis of 502 thyroid nodules ≥1 cm referred for FNA: the ACR-, EU-, and K- TIRADS, the ATA guidelines and the American Association of Clinical Endocrinologists/American College of Endocrinology/Associazione Medici Endocrinologi (AACE/ACE/AME) guidelines. They found NPVs ranging between 95.9% and 97.8% across all guidelines. However, following the ACR-TIRADS would have averted FNA in the most patients, avoiding 53.4% of biopsies requested for less evidence-based reason, compared from 17.1% to 43.8% for the other guidelines. This system also resulted in the lowest FNR (2.2% vs. 2.9%–4.1% for the other guidelines).2 Ha et al. retrospectively compared the performance of seven guidelines in the diagnosis of 2000 thyroid nodules ≥1 cm, the ACR- and K-TIRADS, the ATA and AACE/ACE/AME guidelines, the National Comprehensive Cancer Network (NCCN) guidelines, the French Society of Endocrinology (FSE) guidelines, and the Society of Radiology in Ultrasound (SRU) guidelines with the ACR-TIRADS demonstrating the lowest rate of unnecessary FNAs (25.3% vs. 29.1%–56.9%), despite similar NPVs ranging from 82.9% to 94.2% across the studies.3 In a separate study on a different patient cohort, Ha et al. retrospectively compared the performance of the ATA guidelines, ACR- and K-TIRADS in the diagnosis of 902 thyroid nodules >5 mm, recapitulating the results of the larger study. The NPVs were similar across the studies (94.4%–100%), and the rate of unnecessary FNAs was again much lower for the ACR-TIRADS (25.8% vs. 51.2%–59.4%).5 Although all guidelines noted have relatively high NPV, the ACR-TIRADS performed best at identifying nodules ≥1 cm that could safely avoid biopsy while demonstrating the lowest number of unnecessary FNAs. In addition, the ACR-TIRADS is unique among the other guidelines in that it is a point-based system rather than a pattern-based system, favoring a synoptic report to make the decision easier for the individual physician. Therefore, universal application of this system could potentially result in less variability of the US interpretation of thyroid nodules across providers and institutions. We recognize the limitation of these studies due to the application of the classification systems to retrospective datasets, which can introduce the possibility of selection bias. However, the blinded nature of the sonographic assessment does help to minimize this issue. The use of such classification schemes can help introduce evidence-based decision making in the management of thyroid nodules to reduce personal bias and potential unnecessary overtreatment. Certainly, other issues will always be at play, including patient preference. Future studies will be needed to determine consistency in performance across different interpreting physicians and validate the results in a prospective manner. All five articles are level IV evidence (case series). Table S1. Summary of selected diagnostic performance of various ultrasound classification systems for thyroid nodules. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call