Abstract

The imbalance of classes in real-world datasets poses a major challenge in machine learning and classification, and traditional synthetic data generation methods often fail to address this problem effectively. A major limitation of these methods is that they tend to separate the process of generating synthetic samples from the training process, resulting in synthetic data that lack the necessary informative characteristics for proper model training. We present a new synthetic data generation method that addresses this issue by combining adversarial sample generation with a triplet loss method. This approach focuses on increasing the diversity in the minority class while preserving the integrity of the decision boundary. Furthermore, we show that reducing triplet loss is equivalent to maximizing the area under the receiver operating characteristic curve under specific conditions, providing a theoretical basis for the effectiveness of our method. In addition, we present a model training approach to further improve the generalization of the model to small classes by providing a diverse set of synthetic samples optimized using our proposed loss function. We evaluated our method on several imbalanced benchmark tasks and compared it to state-of-the-art techniques, demonstrating that our method can deliver even better performance, making it an effective solution to the class imbalance problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.