The classification of different image data is a common task in real life, which has been achieved as early as convolutional neural networks. With the emergence of R-CNN (Region CNN), simple classification tasks began to develop towards object detection tasks, followed by various segmentation tasks. After Faster R-CNN [4], Mask R-CNN [3] achieved instance segmentation while detecting objects. This algorithm model needs to distinguish different target objects in the image panel and learn a large number of characteristics to represent the details of each target object. In dog data images, there are differences and similarities in the characteristics of dog species, such as fur color, texture, and body shape. So in this article, the author selected and created two different dog datasets, representing targets with significant differences and similarities in features. Compare and analyze the performance of Mask R-CNN object detection and instance segmentation using these two dog datasets.