Abstract

Suppose you are a physician with a patient whose complaint could arise from multiple diseases. To attain a specific diagnosis, you might ask yourself a series of yes/no questions depending on observed features describing the patient, such as clinical test results and reported symptoms. As some questions rule out certain diagnoses early on, each answer determines which question you ask next. With about a dozen features and extensive medical knowledge, you could create a simple flow chart to connect and order these questions. If you had observations of thousands of features instead, you would probably want to automate. Machine learning methods can learn which questions to ask about these features to classify the entity they describe. Even when we lack prior knowledge, a classifier can tell us which features are most important and how they relate to, or interact with, each other. Identifying interactions with large numbers of features poses a special challenge. In PNAS, Basu et al. (1) address this problem with a new classifier based on the widely used random forest technique. The new method, an iterative random forest algorithm (iRF), increases the robustness of random forest classifiers and provides a valuable new way to identify important feature interactions. Random forests came into the spotlight in 2001 after their description by Breiman (2). He was largely influenced by previous work, especially the similar “randomized trees” method of Amit and Geman (3), as well as Ho’s “random decision forests” (4). Random forests have since proven useful in many fields due to their high predictive accuracy (5, 6). In biology and medicine, random forests have successfully tackled a range of problems, including predicting drug response in cancer cell lines (7), identifying DNA-binding proteins (8), and localizing cancer to particular tissues from a liquid biopsy (9). Random forests have also … [↵][1]1To whom correspondence should be addressed. Email: michael.hoffman{at}utoronto.ca. [1]: #xref-corresp-1-1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call