Abstract

Synthetic datasets alleviate the shortage of label data in the real world to some extent. But, synthetic datasets still have problems with the complexity of picture backgrounds and text diversity. It is well known that collecting large amounts of real data is a job that requires a lot of human resources and material resources. Therefore, we propose a small batch data augmentation strategy, hoping to improve significant performance by collecting and labeling small batches of real data. We have verified our ideas on a strong baseline. The result shows that the accuracy of the model can be significantly improved by replacing the synthetic dataset with the real dataset, which proved that real datasets could train the model better than synthetic datasets. Then, we use different enhancement strategies to expand the data of small batches of real data sets and explore the performance improvement of the model under the condition of low-resource real data. Finally, we mixed the augmented small batch of real datasets and synthetic datasets to make the model learn the image features of real scenes more elegantly. The results show that the proposed strategy can well fill the gap between synthetic and real datasets and improve the model performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call