Automated recognition of facial expressions is a central component of systems used in an expanding array of domains. For a computer to automatically recognize affect, copious amounts of data are required to successfully train the model. It can often take a lot of work to collect and label data. In recent years, researchers have applied numerous data augmentation strategies to increase the diversity of the data within training datasets. Here, I examined the most common data augmentation strategies to determine which strategies result in higher performance for the facial expression recognition machine learning model. I first tested each data augmentation technique by itself and compared their performances. I next ran an ablation study with the augmentation strategies. I then analyzed the effect of dataset size on the marginal contribution of data augmentation. I find that augmentation does not always improve performance. When the dataset size is small, it results in a degradation of model performance. The accuracy of models with data augmentation starts to outperform the models with no data augmentation when the training dataset size is greater than a certain threshold. These results highlight the importance of considering dataset size when applying data augmentation to computer vision.