Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine

Wei Zhong,Rick Chow,Jieyue He

doi:10.1016/j.eswa.2011.08.036

Abstract

This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges and length of stay (LOS), for patients diagnosed with heart and circulatory disease, diabetes and cancer, respectively. Clinical charge profiles predictions can provides relevant clinical knowledge for healthcare policy makers to effectively manage healthcare services and costs at the national, state, and local levels. Despite its solid mathematical foundation and promising experimental results, SVM is not favorable for large-scale data mining tasks since its training time complexity is at least quadratic to the number of samples. Furthermore, traditional SVM classification algorithms cannot build an effective SVM when different data distribution patterns are intermingled in a large dataset. In order to enhance SVM training for large, complex and noisy healthcare datasets, we propose the Multi-level Support Vector Machine (MLSVM) that organizes the dataset as clusters in a tree to produce better partitions for more effective SVM classification. The MLSVM model utilizes multiple SVMs, each of which learns the local data distribution patterns in a cluster efficiently. A decision fusion algorithm is used to generate an effective global decision that incorporates local SVM decisions at different levels of the tree. Consequently, MLSVM can handle complex and often conflicting data distributions in large datasets more effectively than the single-SVM based approaches and the multiple SVM systems. Both the combined 5 × 2-fold cross validation F test and the independent test show that classification performance of MLSVM is much superior to that of a CVM, ACSVM and CSVM based on three popular performance evaluation metrics. In this work, CSVM and MLSVM are parallelized to speed up the slow SVM training process for very large and complex datasets. Running time analysis shows that MLSVM can accelerate SVM’s training process noticeably when the parallel algorithm is employed.

Full Text