The environmental physiognomy of an area can significantly diminish its aesthetic appeal, rendering it susceptible to visual pollution, the unbeaten scourge of modern urbanization. In this study, we propose using a deep learning network and a robotic vision system integrated with Google Street View to identify streets and textile-based visual pollution in Dhaka, the megacity of Bangladesh. The issue of visual pollution extends to the global apparel and textile industry, as well as to various common urban elements such as billboards, bricks, construction materials, street litter, communication towers, and entangled electric wires. Our data collection encompasses a wide array of visual pollution elements, including images of towers, cables, construction materials, street litter, cloth dumps, dyeing materials, and bricks. We employ two open-source tools to prepare and label our dataset: LabelImg and Roboflow. We develop multiple neural network models to swiftly and accurately identify and classify visual pollutants in this work, including Faster SegFormer, YOLOv5, YOLOv7, and EfficientDet. The tuna swarm optimization technique has been used to select the applied models’ final layers and corresponding hyperparameters. In terms of hardware, our proposed system comprises a Xiaomi-CMSXJ22A web camera, a 3.5-inch touchscreen display, and a Raspberry Pi 4B microcontroller. Subsequently, we program the microcontroller with the YOLOv5 model. Rigorous testing and trials are conducted on these deep learning models to evaluate their performance against various metrics, including accuracy, recall, regularization and classification losses, mAP, precision, and more. The proposed system for detecting and categorizing visual pollution within the textile industry and urban environments has achieved notable results. Notably, the YOLOv5 and YOLOv7 models achieved 98% and 92% detection accuracies, respectively. Finally, the YOLOv5 technique has been deployed into the Raspberry Pi edge device for instantaneous visual pollution detection. The proposed visual pollutants detection device can be easily mounted on various platforms (like vehicles or drones) and deployed in different urban environments for on-site, real-time monitoring. This mobility is crucial for comprehensive street-level data collection, potentially engaging local communities, schools, and universities in understanding and participating in environmental monitoring efforts. The comprehensive dataset on visual pollution will be published in the journal following the acceptance of our manuscript.