Abstract

Diagnosing liver disease in the field of healthcare is not an easy task. However, by utilizing medical records as datasets and applying data mining methods such as K-Nearest Neighbor (K-NN), we can analyze and extract knowledge automatically. The K-NN method has proven to be more effective compared to other methods as it clusters new information by selecting the nearest neighbors based on the value of k. In this study, we employed the Elbow method to determine the optimal value of k by measuring the error rate. The test results revealed that the optimal value of k was found to be 4, with the lowest error rate. In the third test, we achieved a training accuracy of 80.5% and a testing accuracy of 78.9%. After fine-tuning the training data, we were able to improve the accuracy to 82.2% for training and 77.1% for testing. However, in the fourth test, we encountered overfitting issues due to data imbalance caused by inappropriate resampling, resulting in a model that was overly complex and prone to excessive noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call