Slender urban road facilities (SURFs) with extreme aspect ratios are essential roles of the road network and exist everywhere in urban scenarios, such as street lamps, traffic lights, and monitors. Their information provides important knowledge for road safety, traffic control, urban planning, city management, etc. With low cost and high-efficiency benefits, oblique aerial images are easily captured to cover large-scale areas, which can be appropriately used to survey SURFs in urban areas automatically. Meanwhile, instance segmentation methods are frequently applied, predicting bounding boxes and semantic masks of objects simultaneously, yet a minority is delicately designed for SURFs. This research aims to propose an instance segmentation method to automatically extract SURFs from oblique aerial imagery, enhancing both the predicted bounding box and the segmented binary mask. First, we design a dense anchor ratio with an IoU-balanced sampling strategy (DASS) for proposal generation by statistical analysis to improve the predicted bounding box. The proposed sampling strategy can better cover the shape of slender proposals and alleviate the imbalance problem between the target objects and background. Regarding the typical instance segmentation architectures, the binary mask is segmented coarsely due to the downsampling operations. Second, we propose balanced fine-grained features (BFGF) merged into the instance segmentation process via a three-stage architecture, effectively improving the detector’s performance. Specifically, before feature incorporation, the multi-level feature pyramid is refined by rescaling and integrating to obtain balanced features. Moreover, to evaluate the proposed approaches, we contribute an Urban Road Facilities Dataset for our task, URFD, which contains 1075 images and 1378 instances labeled manually and enriches the open resource dataset. By comparative experiments, the proposed method performs superiority in SURF instance segmentation, obtaining an mAP of 0.888 of the predicted bounding box and an mAP of 0.876 of the segmented binary mask.