Clinical tests for diagnosis of any disease may be expensive, uncomfortable, time consuming and can have side effects e.g. barium swallow test for esophageal cancer. Although we can predict non-existence of esophageal cancer with near 100% certainty just using demographics, lifestyle, medical history information, and a few basic clinical tests but our objective is to devise a general methodology for customizing tests with user preferences to avoid expensive or uncomfortable tests. We propose to use classifiers trained from electronic medical records (EMR) for selection of tests. The key idea is to design classifiers with 100% false normal rates, possibly at the cost of higher false abnormal. We find kernel logistic regression to be most suitable for the task. We propose an algorithm for finding the best probability threshold for kernel LR, based on test set accuracy tuning with help of a validation data set. Using the proposed algorithm, we describe schemes for selecting tests, which appear as features in the automatic classification algorithm, using preferences on costs and discomfort of the users i.e the proposed method is able to detect almost all true patients in the population even with user preferred clinical tests. We test our methodology with EMRs collected for more than 3000 patients, as a part of project carried out by a reputed hospital in Mumbai, India. We found that kernel SVM and kernel LR with a polynomial kernel of degree 3, yields an accuracy of 99.18% and sensitivity 100% using only demographic, lifestyle, patient history, and basic clinical tests. We demonstrate our test selection algorithm using two case studies, one using cost of clinical tests, and other using "discomfort" values for clinical tests. We compute the test sets corresponding to the lowest false abnormals for each criterion described above, using exhaustive enumeration of 12 and 15 clinical tests respectively. The sets turn out to be different, substantiating our claim that one can customize test sets based on user preferences.
Read full abstract