Abstract
The growth in the amount of data in companies puts analysts in difficulties when extracting hidden knowledge from data. Several models have emerged that focus on the notion of distances while ignoring the notion of conditional probability density. This research study focuses on segmentation using mixture models and Bayesian networks for medical data mining. As enterprise data becomes large, there is a way to apply data mining methods to make sense of it using classification methods. We designed different models with different architectures and then applied these models to the medical database. The algorithms were implemented for the real data. The objective is to classify individuals according to the conditional probability density of random variables, in addition to identifying causalities between traits from tests of conditional independence and a correlation measure, both based on χ2. After a quick illustration of several models (decision tree, SVM, K-means, Bayes), we applied our method to data from an epidemiological study (done at the University of Kinshasa University clinics) of case-control of prostate cancer. Thus, we found after interpretation of the results followed by discussion that our model allows us to classify a new individual with an accuracy of 96%.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have