Datasets are very important in image recognition research based on machine learning methods. In particular, advanced methods such as deep learning and transfer learning are more dependent on datasets used for training models. The quality of datasets directly affects the final effect of these methods. In the research of crop disease image recognition, due to the complication of the agricultural environment and the variety of crops, datasets are scarce at present. Therefore, more and more researches adopt methods based on transfer learning, which can make up for the lack of data in the target domain with the help of other datasets. Among these methods, the selection of auxiliary domain datasets has great impact on the modeling effect of target domain. In order to clarify the impact of datasets on the research of crop disease image recognition, this study used different deep neural network frameworks to study and compare the effects of different datasets in crop disease image recognition based on transfer learning. The selected datasets include PlantVillage and Image Database for Agricultural Diseases and Pests Research (IDADP), which have been widely used in recent studies. And the selected deep neural network frameworks include ResNet50, InceptionV3, and EfficientNet. In the method of this study, the datasets are preprocessed first, such as data enhancement. After dividing the auxiliary domain and the target domain, the selected deep neural network frameworks are used to pre-train the model on the auxiliary domain dataset. Finally, the parameter-based transfer learning method was used to construct the corresponding crop disease recognition model in the target. In the experiments, multiple different datasets and different models were tested and compared. The results show that when the test set samples and training sample scenarios are consistent, the recognition accuracy of different network frameworks on multiple test sets is generally high. When the scenarios of test set samples and training samples are inconsistent, the recognition of various test sets by different network models cannot obtain ideal results. For the recognition of crop disease images that are collected from the actual cultivation environment, the use of IDADP dataset modeling is better, and it has more practical value in the actual application of crop disease image recognition. Keywords: crop diseases, datasets, transfer learning, deep learning, image recognition DOI: 10.25165/j.ijabe.20221505.7005 Citation: Yuan Y, Chen L, Ren Y C, Wang S M, Li Y. Impact of dataset on the study for crop disease image recognition. Int J Agric & Biol Eng, 2022; 15(5): 181–186.