Abstract

Hospital discharge databases store hundreds of thousands of patients. These datasets are usually used by health insurance companies to process claims from hospitals, but they also represent a rich source of information about the patterns of medical care. The proposed subgroup discovery method aims to improve the efficiency of detecting interpretable subgroups in data. Supervised descriptive rule discovery techniques can prove inefficient in cases when target class samples represent only an extremely small amount of all available samples. Our approach aims to balance the number of samples in target and control groups prior to subgroup discovery process. Additionally, we introduce some improvements to an existing subgroup discovery algorithm enhancing the user experience and making the descriptive data mining process and visualization of rules more user friendly. Instance-based subspace subgroup discovery introduced in this paper is demonstrated on hospital discharge data with focus on medical errors. In general, the number of patients with a recorded diagnosis related to a medical error is relatively small in comparison to patients where medical errors did not occur. The ability to produce comprehensible and simple models with high degree of confidence, support, and predictive power using the proposed method is demonstrated. This paper introduces a subspace subgroup discovery process that can be applied in all settings where a large number of samples with relatively small number of target class samples are present. The proposed method is implemented in Weka machine learning environment and is available at http://ri.fzv.uni-mb.si/ssd.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.