The Effectiveness of Image Augmentation in Deep Learning Networks for Detecting COVID-19: A Geometric Transformation Perspective.

Mohamed Elgendi,Qunfeng Tang,Rabab Ward,Savvas Nicolaou,David Smith,Muhammad Umer Nasir,Newton Howard,Bradley Spieler,Carlo Menon,John-Paul Grenier,William Parker,Richard Ribbon Fletcher,Catherine Batte,William Donald Leslie

doi:10.3389/fmed.2021.629134

Abstract

Chest X-ray imaging technology used for the early detection and screening of COVID-19 pneumonia is both accessible worldwide and affordable compared to other non-invasive technologies. Additionally, deep learning methods have recently shown remarkable results in detecting COVID-19 on chest X-rays, making it a promising screening technology for COVID-19. Deep learning relies on a large amount of data to avoid overfitting. While overfitting can result in perfect modeling on the original training dataset, on a new testing dataset it can fail to achieve high accuracy. In the image processing field, an image augmentation step (i.e., adding more training data) is often used to reduce overfitting on the training dataset, and improve prediction accuracy on the testing dataset. In this paper, we examined the impact of geometric augmentations as implemented in several recent publications for detecting COVID-19. We compared the performance of 17 deep learning algorithms with and without different geometric augmentations. We empirically examined the influence of augmentation with respect to detection accuracy, dataset diversity, augmentation methodology, and network size. Contrary to expectation, our results show that the removal of recently used geometrical augmentation steps actually improved the Matthews correlation coefficient (MCC) of 17 models. The MCC without augmentation (MCC = 0.51) outperformed four recent geometrical augmentations (MCC = 0.47 for Data Augmentation 1, MCC = 0.44 for Data Augmentation 2, MCC = 0.48 for Data Augmentation 3, and MCC = 0.49 for Data Augmentation 4). When we retrained a recently published deep learning without augmentation on the same dataset, the detection accuracy significantly increased, with a and a p-value of 2.23 × 10−37. This is an interesting finding that may improve current deep learning algorithms using geometrical augmentations for detecting COVID-19. We also provide clinical perspectives on geometric augmentation to consider regarding the development of a robust COVID-19 X-ray-based detector.

Highlights

More people are being infected with COVID-19 every day [1]; there is a need for a quick and reliable technology to help with the screening and management of the virus
It should be noted that data augmentation is commonly used in binary classification in cases where a large imbalance exists between the size of the two classes being used in a machine learning model
The first local dataset was collected from Vancouver General Hospital (VGH), British Columbia, Canada, and contains 58 COVID-19 X-ray images

Summary

Introduction

More people are being infected with COVID-19 every day [1]; there is a need for a quick and reliable technology to help with the screening and management of the virus. It should be noted that data augmentation is commonly used in binary classification in cases where a large imbalance exists between the size of the two classes being used in a machine learning model. Algorithms such as SMOTE [4] are often used to augment the minority class by intelligently synthesizing new data without overfitting. There are two ways to apply data augmentation: [1] class-balancing oversampling (number of synthesized images more than in the training dataset), [2] replacement (number of synthesized images equals the number of images in the training dataset) The former is the most used data augmentation approach [5], which is being used to boost the number of images; to our knowledge, the latter is not discussed in the literature. Our focus here directly assesses the impact of data augmentation with replacement

Objectives

Methods

Results

Conclusion