Abstract

Only a subset of the boundary points—the segment borders—have to be taken into account in searching for the optimal multisplit of a numerical value range with respect to the most commonly used attribute evaluation functions of classification learning algorithms. Segments and their borders can be found efficiently in a linear-time preprocessing step. In this paper we expand the applicability of segment borders by showing that inspecting them alone suffices in optimizing any convex evaluation function. For strictly convex evaluation functions inspecting all segment borders is also necessary. These results are derived directly from Jensen's inequality. We also study the evaluation function Training Set Error which is not strictly convex. With that function the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for optimal partition. Examining all alternations also seems necessary, since—analogously to strictly convex functions—the placement of neighboring cut points affects the optimality of an alternation. We test empirically the reduction of the number of cut point candidates that can be obtained for Training Set Error on real-world data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call