In healthcare applications, often it is not possible to record sufficient data as required for deep learning or data-driven classification and feature detection systems due to the patient condition, various clinical or experimental limitations, or time constraints. On the other hand, data imbalance invalidates many of the test results crucial for clinical approvals. Generating synthetic (artificial or dummy) data has become a potential solution to address this issue. Such data should possess adequate information, properties, and characteristics to mimic the real-world data recorded in natural circumstances. Several methods have been proposed for this purpose, and results often show that adding surrogates improves the decision-making accuracy. This article evaluates the most recent surrogate data generation and data synthesis methods to investigate the effects of the number of surrogates on improving the classification results. It is shown that the data analysis/classification results improve with an increasing number of surrogates, but this no longer continues after a certain number of surrogates. This achievement helps in deciding on the number of surrogates for each strategy, resulting in the alleviation of the computation cost.
Read full abstract