Abstract
Objectives: This study aims to make early prediction of Sepsis using ML algorithms and provide comparative analysis of different feature selection techniques. Methods: In this study, the Physionet website dataset has been used. Sepsis categorization is done on the basis of Sepsis-3 criteria. The dataset used is highly imbalanced, with only 2% of the data belonging to Sepsis. The SmoteTomek technique is used to handle an imbalanced dataset. Various filter, embedded, and wrapper feature selection techniques, like tree-based feature selection technique, recursive feature elimination (RFE), information gain, Bhattacharya distance, lasso, etc., have been applied for the top-performing classification ML models. These selection techniques were applied to RandomForest, K Nearest Neighbors (KNN), and Decision Tree models. We compared the impact of these selection techniques on the aforesaid machine learning models. Findings: After applying the RFE technique, the Area Under the Receiver Operating Characteristic curve (AUROC) score of the RandomForest model has slightly increased from 0.996 to 0.9974. KNN model with a tree-based feature selection technique showed the highest sensitivity of 0.934. which is slightly higher than the sensitivity of 0.922, which was without applying any feature selection technique. Along with the AUROC score, the highest performance, in terms of specificity (0.9976), accuracy (0.9959), and f-measure score (0.9062), is achieved when RFE is applied to the RandomForest model. The best selection algorithm for decision trees and KNN is the tree-based selection technique. RFE is the best selection technique for RandomForest. Novelty: In this research, the AUROC score is slightly increased to 0.9974, which has not been achieved yet. Instead of 40, the number of features chosen is 20. This research also provides a comparison of different feature selection techniques like tree-based feature selection, information gain, Bhattacharya, and RFE. It also analyses their impact on the performance of models, which has not been done yet with the same set of selection techniques. Keywords: Sepsis, Machine Learning, Feature selection, Early prediction, Predictive Analytics
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.