Abstract
ABSTRACT Background: Linear Discriminant Analysis (LDA) is a powerful and widely used technique for classification with correlated variables. Principal Components (PCs) group these variables into linear combinations and produce independent variables. The LDA on these PC’s may provide better classification accuracy in clinical diagnostics than on usual measurements. Methodology: Two datasets were utilized for demonstration: one from a Sudden Sensorineural Hearing Loss (SSNHL) case-control study and the other from a Gall Bladder (GB) case-control study. Linear Discriminant Analysis (LDA) was conducted on the actual correlated measured variables for group classification, as well as on the derived principal component variables, to compare their classification accuracies. Performance metrics including Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), Classification Accuracy, and F1 Score were assessed. For validation, a third simulated dataset was employed. Additionally, LDA was performed on each dataset using eigenvectors of the control group applied to the cases and vice versa, revealing a strong agreement in classification as measured by the kappa statistic. Results: When LDA was applied to the actual lipid measurements in the SSNHL dataset, the classification accuracy was 57.2%, and the F1 score was 39.7%. However, when LDA was performed using principal components (PCs), the classification accuracy markedly improved to 99.2%, with an F1 score of 98.5%. Similarly, for the GB cancer dataset, the classification accuracy and F1 score were initially 77.2% and 77.3%, respectively. Upon applying LDA with the PCs, these metrics were significantly enhanced to 98.4% and 98.3%, respectively. For the simulated dataset, both the classification accuracy and F1 score were 99.1%. The study also demonstrated that the classification accuracy and F1 score remained consistent regardless of whether the eigenvectors from the cases or controls were used to classify new subjects (Kappa Statistic = 0.962, P < 0.001). Conclusion: In group separation, utilizing principal components significantly improves classification accuracy and overall performance metrics, outperforming the use of the original correlated predictors.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have