Spatiotemporal models for the 3-D shape and motion of objects allowed large progress in the 1980s in visual perception of moving objects observed from a moving platform. Despite the successes demonstrated with several vehicles, the “4-D approach” has not been accepted generally. Its advantage is that only the last image of the sequence needs to be analyzed in detail to allow the full state vectors of moving objects, including their velocity components, to be reconstructed by the feedback of prediction errors. The vehicle carrying the cameras can, thus, together with conventional measurements, directly create a visualization of the situation encountered. In 1994, at the final demonstration of the project PROMETHEUS, two sedan vehicles using this approach were the only ones worldwide capable of driving autonomously in standard heavy traffic on three-lane Autoroutes near Paris at speeds up to 130 km/h (convoy driving, lane changes, passing). Up to ten vehicles nearby could be perceived. In this paper, the three-layer architecture of the perception system is reviewed. At the end of the 1990s, the system evolved from mere recognition of objects in motion, to understanding complex dynamic scenes by developing behavioral capabilities, like fast saccadic changes in the gaze direction for flexible concentration on objects of interest. By analyzing motion of objects over time, the situation for decision making was assessed. In the third-generation system “EMS-vision” behavioral capabilities of agents were represented on an abstract level for characterizing their potential behaviors. These maneuvers form an additional knowledge base. The system has proven capable of driving in networks of minor roads, including off-road sections, with avoidance of negative obstacles (ditches). Results are shown for road vehicle guidance. Potential transitions to a robot mind and to the now-favored CNN are touched on.