AbstractA road multi‐object detection algorithm is one of the core algorithms for intelligent road cleaning robots relying on machine vision. Most existing object detection algorithms analyse all image regions and finally calculate the category and location of each object. However, it is not necessary to analyse all areas of the image when detecting objects on the road surface where the background changes little, and the number of objects is small because there will be a lot of invalid calculations. If we can perform targeted local analysis on images instead of analysing all image regions, it will improve the detection efficiency. Therefore, this paper proposes a multi‐object detection method using a binocular camera and a convolutional neural network (CNN) that effectively reduces invalid calculations during the detection and improves detection efficiency. In the developed method, the binocular vision image acquired by the binocular camera is stereo matched and equalized, while linear regression and coordinate transformation eliminate the angle of the camera pair concerning the road surface. Then, the coordinates of the regions of interest (ROI) is calculated in the left vision image and the features within the ROI is extracted from the corresponding CNN's feature map. Next, ROI pooling resizes the extracted feature maps of different sizes to the same size, which are then input to the fully connected layers to output the results. The proposed binocular network and faster R‐CNN (VGG16) are trained and tested on a dataset involving 1000 road waste images. The experimental results demonstrate that the developed binocular network improves the detection accuracy and speed by 28.56% and 78.39%, respectively, compared with faster R‐CNN (VGG16), providing a reliable basis for a machine vision‐based intelligent road cleaning robot.