Identifying the type of vehicle on the road is a challenging task, especially in the natural environment with all its complexities, such that the traditional architecture for object detection requires an excessively large amount of computation. Such lightweight networks as MobileNet are fast but cannot satisfy the performance-related requirements of this task. Improving the detection-related performance of small networks is, thus, an outstanding challenge. In this paper, we use YOLOv5s as the backbone network to propose a large-scale convolutional fusion module called the ghost cross-stage partial network (G_CSP), which can integrate large-scale information from different feature maps to identify vehicles on the road. We use the convolutional triplet attention network (C_TA) module to extract attention-based information from different dimensions. We also optimize the original spatial pyramid pooling fast (SPPF) module and use the dilated convolution to increase the capability of the network to extract information. The optimized module is called the DSPPF. The results of extensive experiments on the bdd100K, VOC2012 + 2007, and VOC2019 datasets showed that the improved YOLOv5s network performs well and can be used on mobile devices in real time.