To improve the vehicle detection accuracy and solve the problem that small vehicles are difficult to detect, an adaptive multi-scale feature fusion network (AMFFN) is proposed to deal with the multi-scale problem, and better performance is achieved after applying it to you only look once (YOLO) v4. To improve the representation capability of features, spatial pyramid pooling modules were employed on each feature map. The proposed AMFFN fuses features of multiple scales across layers and assigns learnable weights to layers of different scales. To achieve detailed information better, we select dynamic rectified linear unit as the activation function, which can change dynamically with the input. AMFFN can be treated as a reusable module to obtain more refined features by repeatedly fusing features. To avoid the huge amount of parameters caused by the complex network, depthwise separable convolution is used to replace the normal convolution and increase the speed of detection. Experimental results show that the proposed method has higher detection accuracy and faster detection speed, and the performance is better than that of YOLO v5, which is the latest version of the YOLO series of algorithms.