Abstract

We show that only segment borders have to be taken into account as cut point candidates when searching for the optimal multisplit of a numerical value range with respect to convex attribute evaluation functions. Segment borders can be found efficiently in a linear-time preprocessing step. With training set error, which is not strictly convex, the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for the optimal partition. We show that no segment borders (resp. alternations) can be overlooked with strictly convex functions (resp. training set error) without risking the loss of optimality. Our experiments show that while in real-world domains a significant reduction in the number of cut point candidates can be obtained for training set error, the number of segment borders is usually not much lower than that of boundary points.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call