Abstract

Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. Typically, several studies should be conducted to find all these indices. Herein, we show that they already exist, hidden in the distribution of the variable used to classify, and can readily be harvested. An educated guess about the distribution of the variable used to classify in each class would help us to decompose the frequency distribution of the variable in population into its components—the probability density function of the variable in each class. Based on the harvested parameters, we can then calculate the performance indices of the classifier. As a case study, we applied the technique to the relative frequency distribution of prostate-specific antigen, a biomarker commonly used in medicine for the diagnosis of prostate cancer. We used nonlinear curve fitting to decompose the variable relative frequency distribution into the probability density functions of the non-diseased and diseased people. The functions were then used to determine the performance indices of the classifier. Sensitivity, specificity, the most appropriate cut-off value, and likelihood ratios were calculated. The reference range of the biomarker and the prevalence of prostate cancer for various age groups were also calculated. The indices obtained were in good agreement with the values reported in previous studies. All these were done without being aware of the real health status of the individuals studied. The method is even applicable for conditions with no definite definitions (e.g., hypertension). We believe the method has a wide range of applications in many scientific fields.

Highlights

  • Classification tasks are a common challenge to every field of science

  • As an example for the application of the proposed method, we use the frequency distribution of prostate-specific antigen (PSA), an immunologic biomarker commonly used for the diagnosis of prostate cancer, and compare the results to values reported in previous studies

  • We showed that if we have an educated guess about the distribution of the variable used to classify, we can harvest the reference range for the variable; the prior probability of the condition of interest; the classifier performance indices including its Se, Sp, PPV, negative predictive values (NPV), likelihood ratio (LR), the receiver operating characteristic (ROC) curve and the area under the curve, and the most appropriate cut-off value

Read more

Summary

Introduction

Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. As an example for the application of the proposed method, we use the frequency distribution of prostate-specific antigen (PSA), an immunologic biomarker commonly used for the diagnosis of prostate cancer, and compare the results to values reported in previous studies. Several studies have so far been conducted to find all these indices We harvest these values from the frequency distribution of PSA measured in a group of people. With acceptable clinical accuracy, after a logarithmic transformation, it has a normal distribution in both non-diseased and diseased people; Ln(PSA) has a binormal distribution in the population (consisting of both non-diseased and diseased individuals)[9]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call