Abstract
Data augmentation has become one of the keys to alleviating the over-fitting of models on training data and improving the generalization capabilities on testing data. Most existing data augmentation methods only focus on one modality, which is incapable when facing multiple data modalities. Some prior works try to interpolate with random coefficients in the latent space to generate new samples, which can generically work for any data modality. However, these works ignore the extra information conveyed by multimodality data. In fact, the extra information in one modality can provide semantic directions to generate more meaningful samples in another modality. This paper proposes Cross-modal Data Augmentation (CMDA), a simple yet effective data augmentation method to alleviate the over-fitting issue and improve the generalization performance. We evaluate CMDA on unsupervised and supervised tasks of different modalities, on which CMDA consistently and significantly outperforms baselines. For instance, CMDA improves the unsupervised anomaly detection baseline in vision modality from the AUROC <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$76.46\%, 73.07\%$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$64.36\%$</tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$83.25\%, 76.22\%$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$70.57\%$</tex-math></inline-formula> on three different datasets, respectively. Besides, extensive experiments demonstrate that CMDA is applicable to various neural network architectures. Furthermore, prior methods that interpolate in the latent space need to work with downstream tasks to construct the latent space. In contrast, CMDA can work with or without downstream tasks, which makes the applicability of CMDA more extensive. Our source code is publicly available for non-commercial or research use at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Anfeather/CMDA</uri> .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.