This article aims to determine possible improvements made by feature extraction methods to the machine learning prediction methods for predicting 30-day hospital readmissions. The study evaluates five feature extraction methods including principal component analysis (PCA), kernel principal component analysis (KPCA), isomap, Laplacian eigenmaps, and locality preserving projections (LPPs) for improving the accuracy of nine machine learning prediction methods in predicting 30-day hospital readmissions. The specific prediction methods considered include logistic regression, Cox regression, linear discriminant analysis, k-nearest neighbor (KNN), support vector machines (SVMs), bagged trees, boosted trees, random forest, and artificial neural networks. All models are developed in MATLAB and validated using area under the curve based on two population-based data sets from partner hospitals. Laplacian eigenmaps and isomap feature extraction provide the most improvement to the readmission predictive accuracy of KNN, SVM, bagged trees, boosted trees, and linear discriminant analysis methods. The results for artificial neural networks, random forest, Cox regression, and logistic regression show improvement for only one of the data sets. Also, PCA and LPP provided the best computation efficiency followed by KPCA, Laplacian eigenmaps, and isomap. Feature extraction methods can improve the predictive performance of machine learning methods for predicting readmissions. However, the improvement depended on the specific choice of the prediction method, feature extraction method, and the complexity of the data set features.
Read full abstract