Abstract

Most existing deep learning networks for computer vision attempt to improve the performance of either semantic segmentation or object detection. This study develops a unified network architecture that uses both semantic segmentation and object detection to detect people, cars, and roads simultaneously. To achieve this goal, we create an environment in the Unity engine as our dataset. We train our proposed unified network that combines segmentation and detection approaches with the simulation dataset. The proposed network can perform end-to-end prediction and performs well on the tested dataset. The proposed approach is also efficient, processing each image in about 30 ms on an NVIDIA GTX 1070.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call