Abstract

Diagnosis of a dental condition is the process by which we determine whether a person has the condition (target condition) of interest or not, and this is usually achieved by using relevant diagnostic tests. This short paper aimed to introduce the measures used to describe the performance of diagnostic tests. Diagnostic accuracy refers to the ability of a test to correctly detect the presence or not of the target condition. Diagnostic tests are not perfect in discriminating between patients with or without the target condition, and therefore the diagnostic accuracy of each test should be evaluated. The evaluation comes through comparing the diagnostic test examined, called index test, with a reference standard. The reference standard is a test or a procedure considered a reliable guide as to the absence or presence of the target condition. There can be 2 index test errors: false-positive and false-negative results. A test may be a false positive when the person does not have the target condition or a false negative when the person has the condition (Table I).Table ITwo-by-two classification table of a diagnostic testTarget condition status (reference standard)With the target condition (diseased)Without the target condition (healthy)Index test resultPositiveNegativeTrue positive (TP):The participant has the condition, and the test result is positiveFalse negative (FN):The participant has the condition, and the test result is negativeFalse positive (FP):The participant has not the condition, and the test result is positiveTrue negative (TN):The participant has not the condition, and the test result is negative Open table in a new tab Index tests can be based on a binary marker, directly providing a positive or negative test result, like the x-rays, which can directly reveal a root fracture. There are also continuous index tests, like blood tests; these tests usually measure the levels of a substance or biomarker and require setting a cutoff (threshold) value to dichotomize the test results and then decide on the basis of this cutoff. A test is considered positive if the measured value exceeds the predefined threshold. For example, when testing for periodontitis by measuring C-reactive protein levels in the blood, either a threshold of 5 mg/dL or 7mg/dL can be used. Hence, when a participant has a C-reactive protein value greater than 5 (or 7) mg/dL, (s)he is considered to have periodontitis. Often, a set of different thresholds can be used for a single index test; however, as can be seen in Figure 1 and Table II, lower thresholds produce more true and false positives.Table IIC-reactive protein levels in the blood to test for periodontitis at threshold 1 (5 mg/dL) and 2 (7 mg/dL)ThresholdDataSensitivitySpecificityDORLR+PPVNPVThreshold 1 (5 mg/dL)TP = 238 FN = 17FP = 104 TN = 255238238+17=93%255255+104=71%32.53.270%94%Threshold 2 (7 mg/dL)TP = 116 FN = 71FP = 51 TN = 376116116+71=62%37651+376=88%12.05.269%84%TP, true positive; TN, true negative; FP, false positive; FN, false negative. Open table in a new tab TP, true positive; TN, true negative; FP, false positive; FN, false negative. The diagnostic performance of index tests is commonly described using 2 basic concepts: sensitivity and specificity. The sensitivity, or true-positive rate, is the probability an individual has a positive index test result when the target condition is present; it describes the ability of a test to correctly identify diseased patients. The specificity, or true-negative rate, is defined as the probability an individual has a negative index test result when the target condition is absent; it describes the ability of a test to rightly identify healthy participants. Both can be treated as proportions. An ideal test would have both sensitivity and specificity close to 100%, in the sense that false negatives and false positives are close to zero. High specificity and high sensitivity of a test indicate that this test would be very useful, especially if it is easier to conduct than the gold standard; for example, a diagnosis from clinical examination (index test) vs magnetic resonance imaging (gold standard) for temporomandibular joint disc displacement. Unfortunately, 100% sensitivity and specificity are very uncommon in real life, and the choice between optimal sensitivity vs optimal specificity can depend on the question at hand. High sensitivity is important when the cost of a false negative is high, whereas high specificity is important when the goal is to rule out the target condition on the basis of a test result. When test thresholds vary, sensitivity and specificity are inversely proportional; with a threshold change, an increase in sensitivity leads to a decrease in specificity and vice versa (threshold effect).1Parikh R. Mathai A. Parikh S. Chandra Sekhar G. Thomas R. Understanding and using sensitivity, specificity, and predictive values.Indian J Ophthalmol. 2008; 56: 45-50Crossref PubMed Scopus (671) Google Scholar In Figure 1, the 2 different threshold values for testing for periodontitis are displayed: a lower (5 mg/dL) and a higher (7 mg/dL). When the threshold increased from 5 to 7 mg/dL, the number of true-positive cases decreased, whereas true-negative cases increased. Consequently, the test’s sensitivity (ie, the ratio of true positive over patients with the target condition) decreased, whereas the specificity (ie, the ratio of true negative over patients without the target condition) increased. In Table I, at the 5 mg/dL threshold, sensitivity is 93% and specificity 71%. That means that the test correctly gives a positive result for 93% of participants with periodontitis (7% of participants with the target condition were classified falsely as negative), and a negative test result for 71% of healthy participants regarding periodontitis (29% of participants without the target condition is classified falsely as positive). When the threshold increases at 7 mg/dL, sensitivity decreases to 62%, and specificity increases to 88%. In brief, threshold selection plays a crucial role in diagnostic test accuracy studies as a change may change the patients’ classification and, consequently, diagnostic test accuracy measures. A likelihood ratio (LR) of a diagnostic test describes how much the probability of having the target condition changes, given a test result. It is defined as the probability of a participant to have the target condition, given a test result, divided by the probability of a participant not having the target condition, given the same test result. Test results are either positive or negative. Consequently, there are 2 ratios, the positive LR (LR+) and the negative LR (LR−), which describe how many times more likely positive (or negative for LR−) test results are in the participants’ group with the target condition rather than the participants’ group without the target condition.2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar LRs range from zero to infinity and can be derived using sensitivity and specificity (Table III). The greater the LR+ than 1, the better the test for confirming the target condition, and the lower the LR− the better the test ruling out the target condition. For example, in the data provided in Table II, the LR+ for 5 mg/dL test is 3.2. This means that a positive periodontitis test result is 3.2 times more likely in participants with periodontitis than in participants without periodontitis.Table IIIMeasures of diagnostic test accuracyMeasureDefinitionFormulaSensitivityProbability of test to detect the diseased patientsTPTP+FNSpecificityProbability of test to detect the healthy patientsTNTN+FPLRPositive: how many times more likely positive test results are in participants with the target condition vs participants without the target conditionNegative: how many times more likely negative test results are in participants without the target condition vs participants without the target conditionSensitivity1−Specificity1−SensitivitySpecificityDiagnostic odds ratioHow many times more likely is a positive test result in participants with vs participants without the target conditionSensitivity×Specificity(1−Sensitivity)×(1−Specificity)Predictive valuesPositive: probability to have the condition given a positive test resultNegative: probability not to have the condition given a negative test resultTPTP+FPTNTN+FNPrevalenceThe proportion of participants with the target conditionTP+FNTP+FN+TN+FPROC curveA plot of sensitivity against 1 − specificity, constructed to illustrate the diagnostic performance of a test. The closer the curve to the upper left corner of the ROC space, the better the test Open table in a new tab The sensitivity and specificity of a test are typically reported as a pair. The diagnostic odds ratio (DOR) is a common approach to combine the 2 quantities into a single measure; it is defined as the ratio of the odds of test positivity in diseased over the odds of test positivity in healthy patients and can also be derived using the estimated sensitivity and specificity.2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar,3Glas A.S. Lijmer J.G. Prins M.H. Bonsel G.J. Bossuyt P.M. The diagnostic odds ratio: a single indicator of test performance.J Clin Epidemiol. 2003; 56: 1129-1135Abstract Full Text Full Text PDF PubMed Scopus (1590) Google Scholar DOR is easy to calculate but often difficult to interpret. It ranges from zero to infinity: a DOR greater than 1 indicates that the test has a good discriminating ability, whereas the higher the DOR, the better the test. In the data provided in Table II, a DOR at threshold 5 mg/dL is 32.5, and a DOR at threshold 7 mg/dL is 12.0. Sensitivity and specificity refer to the performance of a test, and given the status of the medical condition, we see if the test performs well or poorly. However, a question of interest would be the following: given the results of the test, what is the condition of the person? This can be provided by the positive predictive value (PPV) and negative predictive value (NPV). PPV is the probability that a participant truly has the target condition given a positive index test result.1Parikh R. Mathai A. Parikh S. Chandra Sekhar G. Thomas R. Understanding and using sensitivity, specificity, and predictive values.Indian J Ophthalmol. 2008; 56: 45-50Crossref PubMed Scopus (671) Google Scholar,2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar NPV is the probability that a participant does not have the target condition given a negative index test result.1Parikh R. Mathai A. Parikh S. Chandra Sekhar G. Thomas R. Understanding and using sensitivity, specificity, and predictive values.Indian J Ophthalmol. 2008; 56: 45-50Crossref PubMed Scopus (671) Google Scholar,2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar In the data provided in Table II, at 5 mg/dL threshold, PPV is 70%, and NPV is 94%. Hence, a patient is 70% likely to have periodontitis, given a positive test result, whereas a patient is 94% likely not to have periodontitis, given a negative test result. The prevalence measures how common is the target condition in a defined population and is expressed as a proportion.4Tenny S. Hoffman M.R. Prevalence. StatPearls Publishing, Treasure Island2021Google Scholar Sensitivity and specificity are the test’s characteristics and remain unaffected by any prevalence changes.2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar Consequently, because DOR and likelihood ratios are estimated through sensitivity and specificity, they are also robust measures, irrespective of the prevalence of the target condition. However, changes in prevalence can influence the predictive values. More specifically, as prevalence increases, PPV would increase, whereas NPV would decrease as for every true-positive test result, and there would be fewer false positives. In contrast, a decrease in prevalence would decrease PPV and increase NPV.1Parikh R. Mathai A. Parikh S. Chandra Sekhar G. Thomas R. Understanding and using sensitivity, specificity, and predictive values.Indian J Ophthalmol. 2008; 56: 45-50Crossref PubMed Scopus (671) Google Scholar,2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar,4Tenny S. Hoffman M.R. Prevalence. StatPearls Publishing, Treasure Island2021Google Scholar The receiver operating characteristic (ROC) curve is a graphical way to represent the performance of diagnostic tests. A ROC curve is created by plotting sensitivity (y-axis) against 1 − specificity (x-axis); it illustrates the trade-off between sensitivity and specificity at every threshold included.5Akobeng A.K. Understanding diagnostic tests 3: receiver operating characteristic curves.Acta Paediatr. 2007; 96: 644-647Crossref PubMed Scopus (1214) Google Scholar The closer the curve to the upper left corner, the better the test; such a test would have sensitivity and specificity close to 100%. In ROC space, the diagonal line represents tests with no accuracy. A test with a ROC curve close to the diagonal line tends to be less accurate, whereas a ROC curve beneath the diagonal implies a misclassification problem (healthy are classified as diseased and vice versa). In Figure 2, the ROC curves of blue fluorescence (BF), violet fluorescence (VF), and orange fluorescence (OF) for diagnosing dental caries are displayed. BF ROC curve is the closest to the top left corner, above VF and OF ROC curves. This shows that BF is the best among the 3 tests. However, OF ROC curve is under the no-accuracy line, which means that patients with dental caries may be wrongly classified as non-problematic patients using the OF. We have seen that both sensitivity and specificity of a diagnostic test depend on the threshold used and that they can be misleading when looked at without considering the prevalence of the target condition. This could lead to the well-known prosecutor’s fallacy, also known as the confusion of the inverse.6Aitken C. Mavridis D. Reasoning under uncertainty.Evid Based Ment Health. 2019; 22: 44-48Crossref PubMed Scopus (6) Google Scholar Sensitivity and specificity are independent of disease prevalence. However, they may vary with the disease spectrum.2Šimundić A.M. Measures of diagnostic accuracy: basic definitions.EJIFCC. 2009; 19: 203-211PubMed Google Scholar

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call