Liver diseases rank among the most prevalent health issues globally, causing significant morbidity and mortality. Early detection of liver diseases allows for timely intervention, which can prevent the progression of such diseases to more severe stages such as cirrhosis or liver cancer. To this end, many machine learning models have been previously developed to early predict liver diseases among potential patients. However, each model has its accuracy and performance limitations. In this paper, we present a comprehensive comparison of three different machine learning models that can be employed to enhance the prediction and management of liver diseases. We utilize a big data set of 32,000 records to evaluate the performance of each model. First, we implement a preprocessing technique to rectify missing or corrupt data in liver disease datasets, ensuring data integrity. Afterwards, we compare the performance of three machine models: k-nearest neighbors (KNN), gaussian naive Bayes (Gaussian NB) and random forest (RF). We concluded that the RF algorithm demonstrates superior performance in our evaluation, excelling in both predictive accuracy and the ability to classify patients accurately regarding the presence of liver disease. Our results show that RF outperforms other models based on several performance metrics including accuracy: 97.3%, precision: 97%, recall: 96%, and f1-score: 95%.
Read full abstract