Objective. In this work, we propose a convolutional neural network (CNN)-based multi-slice ideal model observer using transfer learning (TL-CNN) to reduce the required number of training samples. Approach. To train model observers, we generate simulated breast CT image volumes that are reconstructed using the FeldkampDavisKress algorithm with a ramp and Hanning-weighted ramp filter. The observer performance is evaluated on the background-known-statistically (BKS)/signal-known-exactly task with a spherical signal, and the BKS/signal-known-statistically task with random signal generated by the stochastic grown method. We compare the detectability of the CNN-based model observer with that of conventional linear model observers for multi-slice images (i.e. a multi-slice channelized Hotelling observer (CHO) and volumetric CHO). We also analyze the detectability of the TL-CNN for different numbers of training samples to examine its performance robustness to a limited number of training samples. To further analyze the effectiveness of transfer learning, we calculate the correlation coefficients of filter weights in the CNN-based multi-slice model observer. Main results. When using transfer learning for the CNN-based multi-slice ideal model observer, the TL-CNN provides the same performance with a 91.7% reduction in the number of training samples compared to that when transfer learning is not used. Moreover, compared to the conventional linear model observer, the proposed CNN-based multi-slice model observers achieve 45% higher detectability in the signal-known-statistically detection tasks and 13% higher detectability in the SKE detection tasks. In correlation coefficient analysis, it is observed that the filters in most of the layers are highly correlated, demonstrating the effectiveness of the transfer learning for multi-slice model observer training. Significance. Deep learning-based model observers require large numbers of training samples, and the required number of training samples increases as the dimensions of the image (i.e. the number of slices) increase. With applying transfer learning, the required number of training samples is significantly reduced without performance drop.