Novel machine learning algorithms

Alireza Farhangfar

doi:10.7939/r3c07x

Abstract

Many machine learning algorithms learn from the data by capturing certain interesting characteristics. Decision trees are used in many classification tasks. In some circumstances, we only want to consider fixed-depth trees. Unfortunately, finding the optimal depth- d decision tree can require time exponential in d. In the first part of this dissertation, we present OPTN BDT algorithm, which is a fast way to produce a near optimal fixed-depth decision tree under the Naive Bayes assumption. We apply this technique to real-world datasets and find that our model improves the computational costs significantly while yielding relatively high accuracy. In the second part of this dissertation, we present two novel algorithms for active learning. There are scenarios where we have access to many unlabeled data; however, obtaining the labels for the data is difficult. An active learning process tries to address this issue by selectively requesting the label of few unlabeled instances, with the goal of using these newly labeled instances to produce an effective classifier. First, we focus on active learning for image segmentation, which requires producing a label for each pixel in an image. We provide an active learner (LMU) that first selects the image whose expected label will reduce the uncertainty of other unlabeled images the most, and then after greedily selects the most informative image. The results of our experiments, over real-world datasets show that training on very few informative images can produce a segmenter that is as good as training on the entire dataset. Finally, we present the importance sampling algorithm (ISAL) for actively learning in the standard classification framework, which we demonstrate its sample and label efficiency. In particular, on each iteration, ISAL identifies a distribution that puts large weight on instances whose labels are uncertain, then requests the label of an instance drawn from that distribution. We prove that ISAL can be more sample and label efficient than passive learning with an exponential convergence rate to the Bayes classifier on noise-free data. We also provide empirical studies that show ISAL is more effective than many other active learning algorithms.

Full Text