Background. Thyroid cancer is one of the most common cancers and is especially common in young patients. Therefore, effective recognition and treatment of thyroid cancer are essential for patient survival.
Objectives. To compare the effectiveness of standard guidelines for predicting thyroid malignancy. To do so, thyroid nodules were classified according to the categories of the American Thyroid Association (ATA) and Thyroid Imaging Reporting and Data System (TI-RADS) guidelines, and compared with fine-needle aspiration biopsy (FNAB) results.
Materials and methods. The study included 1741 thyroid nodules with a final diagnosis in 1121 consecutive patients. The FNAB was recommended for all patients according to ATA guidelines and subsequently performed. The nodules were reclassified according to TI-RADS guidelines.
Results. Comparing nodules classified according to ATA and TI-RADS in terms of ultrasonography (US) features with the Bethesda cytological diagnosis classification System for Reporting Thyroid Cytopathology, 37.6% of the nodules classified in the high-risk category according to the ATA classification were found to be malignant cytology, 10.4% suspicious for malignancy, 4% non-diagnostic, 9.6% indeterminant cytology, and 38.4% benign. According to the TI-RADS risk category, 50% of those with high suspicion were malignant, 13.3% suspicious for malignancy cytology and 36.7% were benign. For the TI-RADS guidelines, the best cutoff value for differentiating benign and malignant nodules was found to be 4.5 (area under the curve (AUC) = 0.962, 95% CI = 0.943–0.981, p < 0.001). For the ATA guidelines, the best cutoff value for separating benign and malignant nodules was 4.5 (AUC = 0.917, 95% CI = 0.875–0.959, p < 0.001). The diagnostic performances of the TI-RADS and ATA score systems were evaluated using highly suspicious nodules. The sensitivity and specificity of highly suspicious nodules, according to both TI-RADS and ATA guidelines, were both high. Sensitivity and specificity of ATA classification were 80% and 96.3%, respectively. Sensitivity and specificity of TI-RADS classification were 76% and 97.5%, respectively, but positive predictive value was low (63.3% compared to 55.5%).
Conclusions. Both, the ATA and TI-RADS classifications can effectively predict malignancy risk of thyroid nodules and may thus decrease unnecessary FNAB.
Key words: thyroid nodule, ultrasonography, risk of malignancy, fine-needle aspiration
The frequency of thyroid nodules detection has increased in recent years, largely due to the widespread use of ultrasonography (US) in more places. While the prevalence of thyroid nodules is detected at a rate of 4% with palpation, its prevalence varies between 190 and 347 per 1000 cases when thyroid US is used; in autopsy series where nodules are most clearly evaluated, the prevalence is between 82 and 650 in 1000 autopsies.1 Thyroid nodules warrant medical attention because of the possibility of cancer development. The incidence of thyroid cancer is increasing all over the world.2 According to 2020 cancer statistics for the USA, thyroid cancer is most commonly reported in people aged 15–39. Among all cancers that develop between the age of 30 and 39, thyroid cancer is the most common type of cancer in men and the 2nd most common in women.2
Because thyroid cancer is one of the most common cancers and occurs in young patients, effective recognition and treatment are very important for the survival of patients. Some evidence-based guidelines have been developed for the evaluation of patients presenting with thyroid nodules. The American Thyroid Association (ATA) recommends thyroid US along with cervical lymph node examination in patients with suspected thyroid nodules.3 Similarly, the National Comprehensive Cancer Network (NCCN) recommends evaluating the lateral neck compartment lymph nodes along with thyroid US in all patients with an incidentally detected neck mass.4 When performing thyroid US, the clinical aim is to detect nodules with a high-risk of thyroid cancer. The presence of findings such as microcalcifications, irregular margins and marked hypoechogenicity indicates a higher risk of malignancy. Existing guidelines classify thyroid nodules into risk categories according to the abovementioned suspicious features and make recommendations for biopsy. In the ATA guidelines, fine-needle aspiration biopsy (FNAB) is recommended at 1 cm and above for high- or moderate-suspicion nodules, 1.5 cm and above for low-suspicion nodules and 2 cm and above for very-low-suspicion nodules.3 In the Thyroid Imaging Reporting and Data System (TI-RADS) developed by the American College of Radiology (ACR), FNAB is recommended at 1 cm and above for nodules in the high-suspicion category, 1.5 cm and above in the moderate-suspicion category and 2.5 cm and above in the mild-suspicion category.5
The efficacy of FNAB recommendations based on the ATA and TI-RADS guidelines in predicting malignancy of thyroid nodules has been reported in previous studies. Both ATA and TI-RADS guidelines classify nodules into risk groups. Although there are some similarities, the classifications also differ in some aspects. Currently, it is not clear whether these differences in the classifications may result in differences in predicting malignancy. In the present study, we aimed to classify biopsied thyroid nodules according to the risk categories of both guidelines and then evaluate whether there is a difference between the guidelines in predicting malignancy.
Materials and methods
The Ethics Committee of Kahramanmaraş Sütçü İmam University (KSU), Kahramanmaraş, Turkey, approved this retrospective, cross-sectional study (approval No. 22, decision date March 6, 2019).
This study included a total of 1741 thyroid nodules (this number was determined using power analysis), with final diagnosis in 1121 consecutive patients (age: 51.54 ±13.53 years). Routine US-guided FNAB (USg-FNAB) was performed according to the 2015 ATA guidelines.3
US examination and image analysis
Ultrasound machines (General Electric Logic P5; General Electric, Schenectady, USA) equipped with a 12-MHz linear probe were used for analysis. Generally, USg-FNAB is recommended for all patients with hypoechoic solid nodules ≥1 cm in diameter, isoechoic solid nodules ≥1.5 cm, mixed cystic–solid nodules and spongiform nodules ≥2 cm, and high-risk history with nodules ≥5 mm according to the ATA guidelines. Microcalcification, taller-than-wide shape, irregular margins, and pronounced hypoechogenicity are considered suspicious characteristics. Biopsy was not performed because pure cystic nodules are considered benign.
The nodules were then reclassified in accordance with the ACR TI-RADS guidelines and evaluated in terms of echogenic foci, margin irregularity, taller-than-wide shape, and calcification and microcalcification. The risk category of the nodules was scored according to these features. Nodules were classified as benign (TR1, 0 points), very low suspicion (TR2, 2 points), low suspicion (TR3, 3 points), intermediate suspicion (TR4, 4–6 points), and high suspicion (TR5, ≥7 points).5
The FNAB was performed by endocrinologists under US guidance using 23–27 gauge needles. The FNAB procedure was performed with the patient lying in the supine position, with the neck extended. During the procedure, a sample was taken from all sides of the nodule. Biopsy was taken from solid parts in mixed echogenic nodules.
Cytopathological interpretation of FNAB samples was performed using the Bethesda System for Reporting Thyroid Cytopathology.6 Retrospective reclassification of all nodules according to TI-RADS system was blind regarding the FNAB results.5
Statistical analyses were performed using IBM SPSS v. 22.0 for Windows (IBM Corp., Armonk, USA). Continuous data are presented as mean ± standard deviation (SD). Categorical variables were evaluated using the McNemar’s and Pearson’s χ2 test. Nominal data are given as number of cases and percentage. The independent two-sample t-test was used to compare 2 groups in terms of age and thyroid stimulating hormone (TSH) level. We measured the specificity, sensitivity, negative/positive predictive value (PPV), and accuracy of both guidelines in terms of the diagnosis of malignant thyroid nodules. The malignancy risk of the TI-RADS scores and groups and the ATA risk stratification grades were measured on the basis of the cytopathological findings. The diagnostic performance of the TI-RADS and ATA score systems were evaluated using receiver operating curve (ROC) analysis based on high-suspicion nodules. Differences were considered to be statistically significant when p < 0.05.
Demographic and laboratory findings
The malignancy rate was 5.0% (n = 36) in women and 4.9% (n = 8) in men, and the difference was not statistically significant (p = 0.512). The mean age of patients with benign cytology results was 51.68 ±13.39 years and for malignant cases mean age was 50.84 ±16.15 years; the difference was not statistically significant (p = 0.517). Additionally, when patients with malignant and benign cytology results were compared in terms of TSH levels (1.88 ±0.44 compared to 65 ±0.52 mIU/L, respectively), no significant difference was found (p = 0.526).
Thyroid US and USg-FNAB
The US findings for the thyroid nodules are shown in Table 1. Ultrasonography-guided FNAB and cytological analysis were performed on 1741 nodules of 1121 cases. When non-diagnostic cytology findings were detected after the first USg-FNAB, repeat FNAB was performed. Accordingly, we determined that 1327 out of 1741 nodules were benign (76.2%), 148 were non-diagnostic (8.5%), 191 were indeterminate (in Bethesda classification, indeterminate cytology includes FLUS (follicular lesion of undetermined significance) and AUS (atypia of undetermined significance)) and/or follicular/Hurthle cell neoplasm (11.0%), 23 were suspicious (1.3%), and 52 (3.0%) were malignant cytological findings.
Figure 1 shows the percentage distribution of US features of nodules with malignant/suspicious and benign cytology findings. When malignant/suspicious and benign nodules were compared, hypoechogenicity (36% compared to 23.2%), marked hypoechogenicity (2.7% compared to 0.0%), microcalcification (10.7% compared to 0.0%), solid composition (82.7% compared to 39.8%), taller-than-wide shape (21.3% compared to 1.1%), and margin irregularity (44.0% compared to 3.0%) were significantly more common in malignant/suspicious nodules (p < 0.001). Separate tests were performed for each feature and all values were <0.0001. The p-values of each test were shown in Table 1.
of ATA and TI-RADS
When we compared the nodules classified according to ATA and TI-RADS in terms of ultrasonography (US) features with the Bethesda cytological diagnosis classification System for Reporting Thyroid Cytopathology, 37.6% of the nodules classified in the high-risk category according to the ATA classification were found to be malignant cytology, 10.4% suspicious for malignancy, 4% non-diagnostic, 9.6% indeterminant cytology, and 38.4% benign. According to the TI-RADS risk category, 50% of those with high suspicion were malignant, 13.3% were suspicious for malignancy cytology and 36.7% were benign. Of note, 12.9% and 0.5% (226 of 1741 and 10 of 1741) of the nodules did not fit any category in the ATA and TI-RADS guidelines, respectively. There was no malignancy in any of these nodules (Table 2).
As suggested by the ROC curve analysis (Figure 2), for the TI-RADS classification, the best cutoff value in differentiating benign and malignant nodules was found to be 4.5. Accordingly, a nodule with TR5 is likely malignant, and a nodule with TR4 or below is likely benign. The most reliable diagnosis based on TI-RADS was obtained using this cutoff value (area under the curve (AUC) = 0.962, 95% confidence interval (95% CI) = 0.943–0.981, p < 0.001). Using the ATA guidelines, the best cutoff value for differentiating benign and malignant nodules was 4.5. This means that if a nodule is in the high-suspicion category, it is likely malignant; if it is in the intermediate or lower-suspicion category, it is likely benign. The most reliable diagnosis based on the ATA guidelines was obtained using this cutoff value (AUC = 0.917, 95% CI = 0.875–0.959, p < 0.001).
The diagnostic performances of the TI-RADS and ATA score systems were evaluated based on high-suspicion nodules. The sensitivity and specificity of high-suspicion nodules according to both TI-RADS and ATA were both high (76% compared to 80% and 97.5% compared to 96.3%, respectively), but PPV was low (63.3% compared to 55.5%; Table 3).
Thyroid cancers are observed more frequently in women. In a large-scale study, approx. 75% of thyroid cancer cases were reported to be women.7 In our study, when the sexes were evaluated separately, the rate of cases with malignancy confirmed using FNAB was found to be similar (4.9% in men, 5% in women). However, when all patients who underwent FNAB were evaluated, the majority of patients with malignancy were women. Thus, this finding is consistent with previous findings stating that the incidence of thyroid cancer is higher in women.
In this study, the Bethesda classification was used for pathological evaluation of FNAB performed on the thyroid nodules. The malignancy risk of each category in the Bethesda system has been demonstrated in prior studies: while the risk of malignancy in the benign category is approx. 0–3%, it is 97–99% in the malignant category, which means that the Bethesda system can accurately estimate the risk of malignancy.6 In the Bethesda system, the non-diagnostic category is applied when the sample is not large enough to reach a conclusion. In a previous study, this category was applied to 16% of first biopsies; when a second biopsy was performed in patients with non-diagnostic results, sufficient samples were taken from most of these patients.8 In our study, the rate of the non-diagnostic category was 8.5%. It has been suggested that the category of atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS) in the Bethesda system should be below 7%, if possible, in all thyroid FNAB results, although a value of 10% is more reasonable.6 In our study, the AUS/FLUS category was 11% of all FNAB results.
The ATA classifies nodules into risk categories and makes FNAB recommendations accordingly. The ATA high-risk category is reported to have an approx. 70–90% malignancy risk. In a study evaluating the malignancy risk of nodules according to the ATA classification, malignancy was detected in approx. 67.5% of the nodules considered high-risk ones.9 Another study reported a malignancy risk of approx. 83% for high-risk nodules classified using ATA guidelines.10 In our study, 48% of the nodules classified as high-risk according to the ATA risk category were found to have malignant/suspicious cytology results.
In the TI-RADS classification, points are given according to the characteristics of the nodules and the risk classification of the nodules is made considering the total score. In a study investigating the malignancy risk of the TI-RADS classification, the frequency of malignancy was approx. 20.6% for nodules in the high-suspicion category, from 5.9% to 12.8% for nodules in the moderate-suspicion category, and 4.8% in the mild-suspicion category.11 In another study comparing the FNAB results of nodules classified according to the TI-RADS classification, the frequency of malignancy/suspected malignancy in TR5 nodules was observed at a rate of 60%. When the benign FNAB results were examined, it was determined as 100% in the TR2 category, 66% in the TR3 category, 33% in the TR4 category, and 40% in the TR5 category.12 In another study, malignancy was detected histopathologically in 97.1% of nodules in the TR5 category and 33.3% of nodules in the TR3 category.13 In our study, similar to the literature, 63.3% of nodules in the TI-RADS TR5 category were found to be histopathologically malignant/suspected for malignancy.
Malignancy risk is significantly higher in high-suspicion nodules compared with other nodules in both the ATA and TI-RADS classifications. Gao et al. reported a malignancy rate of 88% in the TI-RADS TR5 category and the best cutoff value in predicting malignancy according to the ROC curve was TR5. For the ATA risk classification, the risk of malignancy in highly suspicious nodules was found to be 87.3% and the best cutoff value in predicting malignancy according to the ROC curve was the category of highly suspicious nodules.14 In another study comparing the ATA and TI-RADS risk classifications, the rate of malignancy was found to be 65% in nodules in the ATA high-risk group and 73.6% in nodules in the TI-RADS TR5 category. The authors also found that TI-RADS TR5 and the ATA high-suspicion nodule category were best at distinguishing between benign and malignant nodules. When the sensitivity and specificity of both diagnostic classifications were evaluated, it was observed that the ATA classification was more sensitive and the TI-RADS classification was more specific.15 In a study conducted by Koc et al., 45 nodules were observed as malignant, 34 of which had FNAB indication for TI-RADS, while 38 nodules had FNAB indication according to the ATA classification. In the same study, the sensitivity of TI-RADS was found to be 48.8% and the specificity was 59.9%, while the sensitivity of the ATA classification was 82.2% and the specificity was 53.47%.16 A study evaluating the TI-RADS classification found 81.4% malignancy in the TR5 category, 40.1% in the TR4 category, 7.5% in the TR3 category, 2.3% in the TR2 category, and no malignant cytology in the TR1 category. In addition, the study found that the sensitivity of TI-RADS was 96.6%, the specificity was 52.9% and the cutoff category for predicting malignancy was TR4.17 In a study by Huang et al., when the ATA and TI-RADS classifications were compared, the sensitivity of the ATA classification was 92% and the specificity was 10%, while the sensitivity of the TI-RADS classification was 74% and the specificity was 47%. As a result, the ATA classification was found to be more sensitive, while the TI-RADS classification was more specific.18 In the present study, when we evaluated the power of both classifications to recognize malignancy, we found that the sensitivity of the ATA risk classification was 80% and the specificity was 96.3%, while the sensitivity of the TI-RADS classification was 76% and the specificity was 97.5%. According to these results, ATA and TI-RADS have high specificity and sensitivity. We also found that the best category for distinguishing malignant and benign nodules was TR5 for TI-RADS and the high-suspicion category for ATA.
While almost all nodules are included in a given category in the ATA and TI-RADS classifications, some nodules are outside of a category. In the ACR-TI-RADS classification, the TR1 category includes nodules scoring 0 and the TR2 category includes nodules scoring 2 points; thus, nodules scoring 1 point do not fall into any category. Similarly, in the ATA classification, some nodules are outside of a category. According to the ATA classification, the high-risk nodule category includes hypoechoic nodules with suspicious features, the moderate-risk category includes hypoechoic nodules without suspicious features, and the low-risk category includes isoechoic nodules without suspicious features. Therefore, isoechoic nodules with suspicious features fall outside of a category. In a prior study, the frequency of nodules that could not be classified according to ATA category was 3.4% and the frequency of malignancy in these nodules was 18.2%.19 In a study by Lauria Pantano et al., 54 of 1077 nodules included in the study did not fit any category in the ATA classification, and 9 of them were cytologically high-risk ones.20 In our study, when evaluated according to the ATA category, 12% of the nodules were found to be outside of a category, and 0.5% of these nodules were classified as non-categorized according to the TI-RADS classification. However, no malignancy or suspected malignancy cytology result was observed in any of the nodules that were outside of a category in both classifications.
While we compared the risk categories using both guidelines with the FNAB results, we did not compare risk categories with final pathological results after thyroidectomy because we were not able to follow up with sufficient patients.
Our findings support that both the ATA and TI-RADS classifications can effectively predict malignancy risk in thyroid nodules. Both methods are effective at detecting malignancy in patients and preventing unnecessary FNAB.