Nowadays, video quality assessment (VQA) is essential to video compression technology applied to video transmission and storage. However, small-scale video quality databases with imbalanced samples and low-level feature representations for distorted videos impede the development of VQA methods. In this paper, we propose a full-reference (FR) VQA metric integrating transfer learning with a convolutional neural network (CNN). First, we imitate the feature-based transfer learning framework to transfer the distorted images as the related domain, which enriches the distorted samples. Second, to extract high-level spatiotemporal features of the distorted videos, a six-layer CNN with the acknowledged learning ability is pretrained and finetuned by the common features of the distorted image blocks (IBs) and video blocks (VBs), respectively. Notably, the labels of the distorted IBs and VBs are predicted by the classic FR metrics. Finally, based on saliency maps and the entropy function, we conduct a pooling stage to obtain the quality scores of the distorted videos by weighting the block-level scores predicted by the trained CNN. In particular, we introduce a preprocessing and a postprocessing to reduce the impact of inaccurate labels predicted by the FR-VQA metric. Due to feature learning in the proposed framework, two kinds of experimental schemes including train-test iterative procedures on one database and tests on one database with training other databases are carried out. The experimental results demonstrate that the proposed method has high expansibility and is on a par with some state-of-the-art VQA metrics on two widely used VQA databases with various compression distortions.