Cameras are sensors that are available anywhere and to everyone, and can be placed easily inside vehicles. While stereovision setups of two or more synchronized cameras have the advantage of directly extracting 3D information, a single camera can be easily set up behind the windshield (like a dashcam), or above the dashboard, usually as an internal camera of a mobile phone placed there for navigation assistance. This paper presents a framework for extracting and tracking obstacle 3D data from the surrounding environment of a vehicle in traffic, using as a sensor a generic camera. The system combines the strength of Convolutional Neural Network (CNN)-based segmentation with a generic probabilistic model of the environment, the dynamic occupancy grid. The main contributions presented in this paper are the following: A method for generating the probabilistic measurement model from monocular images, based on CNN segmentation, which takes into account the particularities, uncertainties, and limitations of monocular vision; a method for automatic calibration of the extrinsic and intrinsic parameters of the camera, without the need of user assistance; the integration of automatic calibration and measurement model generation into a scene tracking system that is able to work with any camera to perceive the obstacles in real traffic. The presented system can be easily fitted to any vehicle, working standalone or together with other sensors, to enhance the environment perception capabilities and improve the traffic safety.