Objectives: This study aims to make early prediction of Sepsis using ML algorithms and provide comparative analysis of different feature selection techniques. Methods: In this study, the Physionet website dataset has been used. Sepsis categorization is done on the basis of Sepsis-3 criteria. The dataset used is highly imbalanced, with only 2% of the data belonging to Sepsis. The SmoteTomek technique is used to handle an imbalanced dataset. Various filter, embedded, and wrapper feature selection techniques, like tree-based feature selection technique, recursive feature elimination (RFE), information gain, Bhattacharya distance, lasso, etc., have been applied for the top-performing classification ML models. These selection techniques were applied to RandomForest, K Nearest Neighbors (KNN), and Decision Tree models. We compared the impact of these selection techniques on the aforesaid machine learning models. Findings: After applying the RFE technique, the Area Under the Receiver Operating Characteristic curve (AUROC) score of the RandomForest model has slightly increased from 0.996 to 0.9974. KNN model with a tree-based feature selection technique showed the highest sensitivity of 0.934. which is slightly higher than the sensitivity of 0.922, which was without applying any feature selection technique. Along with the AUROC score, the highest performance, in terms of specificity (0.9976), accuracy (0.9959), and f-measure score (0.9062), is achieved when RFE is applied to the RandomForest model. The best selection algorithm for decision trees and KNN is the tree-based selection technique. RFE is the best selection technique for RandomForest. Novelty: In this research, the AUROC score is slightly increased to 0.9974, which has not been achieved yet. Instead of 40, the number of features chosen is 20. This research also provides a comparison of different feature selection techniques like tree-based feature selection, information gain, Bhattacharya, and RFE. It also analyses their impact on the performance of models, which has not been done yet with the same set of selection techniques. Keywords: Sepsis, Machine Learning, Feature selection, Early prediction, Predictive Analytics
Read full abstract