Information Theory, Kelly Betting, Risk, Reward, Commission, and Omission: An Example Problem in Breast Cancer

Leslie W Dalton

doi:10.1109/access.2014.2363134

Abstract

In binary classification, two-way confusion matrices, with corresponding measures, such as sensitivity and specificity, have become so ubiquitous that those who review results may not realize there are other and more realistic ways to visualize data. This is, particularly, true when risk and reward considerations are important. The approach suggested here proposes that classification need not offer a conclusion on every instance within a data set. If an algorithm finds instances (e.g., patient cases in a medical data set) in which attributes pertaining to a patient's disease offer zero to nil information, there should be no classification offered. From the physician's perspective, disclosure of nil information should be welcome because it might prevent potentially harmful treatment. It follows from this that the developer of a classifier can provide summary results amendable for helping the consumer decide whether or not it is prudent to pass or act (commission versus omission). It is not always about balancing sensitivity and specificity in all cases, but optimizing action on some cases. The explanation is centered on John Kelly's link of gambling with Shannon information theory. In addition, Graham's margin of safety, Bernoulli's utiles, and Hippocratic Oath are important. An example problem is provided using a Netherlands Cancer Institute breast cancer data set. Recurrence score, a popular molecular-based assay for breast cancer prognosis, was found to have an uninformative zone. The uninformative subset had been grouped with positive results to garner higher sensitivity. Yet, because of a positive result, patients might be advised to undergo potentially harmful treatment in the absence of useful information.

Full Text