Abstract

Thus far, all existing Error Correcting Output Codes (ECOC) algorithms produce coding matrices with an equal size for all classes. Yet, this paper proposes a variable-length codewords based ECOC (VL-ECOC), which generates longer codes for hard classes than those for easy classes. VL-ECOC consists of two phases: the overall-class phase and the hard-class phase. In the first phase, the centroids of the top two toughest classes are selected as the centroids of the positive group and the negative group respectively, whereas other classes are assigned to their nearer groups. The remaining hard classes with high error rates will be proceeded to the second phase, in which the K nearest neighbors of the misclassified samples are employed to generate new columns. The codewords generated in the second phase are applied to the decoding process of the hard classes. Consequently, both the easy and hard classes contain distinct code lengths.To verify the performance of VL-ECOC, comprehensive experiments are carried out on the UCI data and the microarray data sets. The experiment results demonstrate that owing to the additional codewords for the hard classes, our algorithm can better handle the class imbalance problem and achieve higher performance in most cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call