Abstract

In syntactic and structural pattern recognition, data represented as strings appear in several supervised classification applications. In some situations, data collections show imbalanced class distributions, which typically results in the classifier biasing its performance to the class representing the majority of objects. To solve this problem, some oversampling methods have been proposed for data represented as strings. However, this type of method has been little studied in the literature. Therefore, in this paper, we present an oversampling method for working in string space that balances the minority class and gets better classification results than state-of-the-art oversampling methods, especially for highly imbalanced problems. Furthermore, according to our experiments, the proposed method is much faster than those reported in the literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.