Abstract

Automated medical diagnosis is one of the important machine learning applications in the domain of healthcare. In this regard, most of the approaches primarily focus on optimizing the accuracy of classification models. In this research, we argue that, unlike general-purpose classification problems, medical applications, such as chronic kidney disease (CKD) diagnosis, require special treatment. In the case of CKD, apart from model performance, other factors such as the cost of data acquisition may also be taken into account to enhance the applicability of the automated diagnosis system. In this research, we proposed two techniques for cost-sensitive feature ranking. An ensemble of decision tree models is employed in both the techniques for computing the worth of a feature in the CKD dataset. An automatic threshold selection heuristic is also introduced which is based on the intersection of features’ worth and their accumulated cost. A set of experiments are conducted to evaluate the efficacy of the proposed techniques on both tree-based and non tree-based classification models. The proposed approaches were also evaluated against several comparative techniques. Furthermore, it is demonstrated that the proposed techniques select around 1/4th of the original CKD features while reducing the cost by a factor of 7.42 of the original feature set. Based on the extensive experimentation, it is concluded that the proposed techniques employing feature-cost interaction heuristic tend to select feature subsets that are both useful and cost-effective.

Highlights

  • Chronic kidney disease (CKD) is an ailment that affects the functionality of a kidney in the body.Generally, chronic kidney disease (CKD) is divided into multiple stages in which the later stages are denoted as a renal failure when the kidney is unable to perform its functions of blood purification and balancing minerals in the body [1]

  • CKD is divided into multiple stages in which the later stages are denoted as a renal failure when the kidney is unable to perform its functions of blood purification and balancing minerals in the body [1]

  • The authors reported the highest accuracy achieved by decision tree-based models in the pool of candidate models which included Naïve Bayes (NB), Support Vector Machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbor (KNN)

Read more

Summary

Introduction

Chronic kidney disease (CKD) is an ailment that affects the functionality of a kidney in the body. In a number of studies performed on CKD diagnosis, decision tree models consistently produced results with high predictive accuracy [8,9,12]. Most of the studies in the CKD domain assume that the cost of data acquisition is symmetric i.e., having the same cost albeit not necessarily zero; the cost factor associated with each feature is generally ignored [6,12,13] This assumption may not hold in many real-world medical applications where a patient is required to undergo multiple tests such as urine analysis, electrocardiogram, blood culture, etc., and the tests may vary in terms of incurred cost. The study addresses the problem of cost-sensitive feature selection for building decision tree models for the CKD diagnosis problem. We proposed two ensemble ranking techniques that use multiple decision tree-based classifiers as heterogeneous scoring functions.

Literature Review
Proposed Methodology
Data Preprocessing
Classifier-Ensemble
Combiner
Feature Cost Aggregator
Threshold and Feature Subset Selector
Experimentation
Dataset Description
Experimental Setup
Method
Feature Weightage Calculation and Feature Subset Acquisition
Ensemble-1 Results
Ensemble-2 Results
Comparison with Other Similar Approaches
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call