We present the EuroCity Persons (ECP) 2.0 dataset, a novel image dataset for person detection, tracking and prediction in traffic. The dataset was collected on-board a vehicle driving through 29 cities in 11 European countries. It contains more than 250K unique person trajectories, in more than 2.0M images and comes with a size of 11 TB. ECP2.0 is about one order of magnitude larger than previous state-of-the-art person datasets in automotive context. It offers remarkable diversity in terms of geographical coverage, time of day, weather and seasons. We discuss the novel semi-supervised approach that was used to generate the temporally dense pseudo ground-truth (i.e., 2D bounding boxes, 3D person locations) from sparse, manual annotations at keyframes. Our approach leverages auxiliary LiDAR data for 3D uplifting and vehicle inertial sensing for ego-motion compensation. It incorporates keyframe information in a three-stage approach (tracklet generation, tracklet merging into tracks, track smoothing) for obtaining accurate person trajectories. We validate our pseudo ground-truth generation approach in ablation studies, and show that it significantly outperforms existing methods. Furthermore, we demonstrate its benefits for training and testing of state-of-the-art tracking methods. Our approach provides a speed-up factor of about 34 compared to frame-wise manual annotation. The ECP2.0 dataset is made freely available for non-commercial research use.
Read full abstract