Efficient Prediction of Liver Disease using Selected Attributes

doi:10.21015/vtse.v12i1.498

Abstract

Liver plays a vital role in the human body that performs several crucial life functions. A number of liver diseases exist and it is a challenging task to diagnose the liver disease at its early stage. In recent years, several data mining techniques have been used in medical field for prediction but there can be further improvements for quick and accurate diagnose of liver disease. In this paper, a variety of Classifiers have been experimented on Indian liver disease patients dataset which is publicly available on Kaggle. Attribute subset selection is performed to identify significant attributes and the resulting dataset is named as Selected Attributes Dataset (SAD). SAD provides more accuracy in less computation time using Random forest classification algorithm and improved system including these parameters i.e., the efficiency of the system can be increased, early decision making, less time and space required. This research work will provide help to predict liver disease with less amount of data, i.e., number of attributes.

Highlights

Data mining is an activity of finding hidden knowledge and useful pattern from large datasets, warehouse and other repository
Bayesian Logistic Regression and SMO/Support Vector Machine (SVM) give better accuracy than others, but Bayesian Logistic Regression has less computation time too. After getting these results we reduce the dimensionality of data by selecting most relevant attributes using Decision tree and ID3/C4.5 algorithms on the bases of high information gain
WEKA is a tool for data mining that containing machine learning algorithms

Summary

Introduction

Data mining is an activity of finding hidden knowledge and useful pattern from large datasets, warehouse and other repository. Data mining is widely used in the medical field for predicting the disease from data produced on a daily basis from patients, diseases, hospital resources, diagnose methods and electronic records. Many researchers have done good work related to prediction of liver disease and used different data mining techniques on liver disease patients data. Different authors had applied different data mining techniques on Indian liver disease patients dataset which is publicly available on Kaggle and shown their comparison results for which algorithm¥technique gives better accuracy to predict liver disease. The problem is they have applied different data mining techniques by using less dataset with less attributes of Indian liver disease patient data (e.g. 29 datasets with 12 different attributes and 345 instances with 7 different attributes) and shows higher accuracies of algorithms. SAD provides more accuracy in less computation time using Random forest classification algorithm and improved system including these parameters i.e., the efficiency of the system can be increased, early decision making, less time and space required

Methods

Results

Conclusion