KITTI Dataset Research Articles

With the continuous advancement of autonomous driving technology, visual analysis techniques have emerged as a prominent research topic. The data generated by autonomous driving is large-scale and time-varying, yet more than existing visual analytics methods are required to deal with such complex data effectively. Time-varying diagrams can be used to model and visualize the dynamic relationships in various complex systems and can visually describe the data trends in autonomous driving systems. To this end, this paper introduces a time-varying graph-based method for visual analysis in autonomous driving. The proposed method employs a graph structure to represent the relative positional relationships between the target and obstacle interferences. By incorporating the time dimension, a time-varying graph model is constructed. The method explores the characteristic changes of nodes in the graph at different time instances, establishing feature expressions that differentiate target and obstacle motion patterns. The analysis demonstrates that the feature vector centrality in the time-varying graph effectively captures the distinctions in motion patterns between targets and obstacles. These features can be utilized for accurate target and obstacle recognition, achieving high recognition accuracy. To evaluate the proposed time-varying graph-based visual analytic autopilot method, a comparative study is conducted against traditional visual analytic methods such as the frame differencing method and advanced visual analytic methods like visual lidar odometry and mapping. Robustness, accuracy, and resource consumption experiments are performed using the publicly available KITTI dataset to analyze and compare the three methods. The experimental results show that the proposed time-varying graph-based method exhibits superior accuracy and robustness. This study offers valuable insights and solution ideas for developing deep integration between intelligent networked vehicles and intelligent transportation. It provides a reference for advancing intelligent transportation systems and their integration with autonomous driving technologies.

Cutting-edge connected vehicle (CV) technologies have drawn much attention in recent years. The real-time traffic data captured by a CV can be shared with other CVs and data centers so as to open new possibilities for solving diverse transportation problems. The trajectory data of CVs have been well-studied and widely used. However, image data captured by onboard cameras in a connected environment, as being a kind of fundamental data source, are not sufficiently investigated, especially for safety and health-oriented visual perception. In this paper, a bidirectional process of image synthesis and decomposition (BPISD) approach is proposed, and thus a novel self-supervised multi-task learning framework, to simultaneously estimate depth map, atmospheric visibility, airlight, and PM2.5 mass concentration, in which depth map and visibility are considered highly associated with traffic safety, while airlight and PM2.5 mass concentration are directly correlated with human health. Both the training and testing phases of the proposed system solely require a single image as input. Due to the innovative training pipeline, the depth estimation network can automatically manage various levels of visibility conditions and overcome diverse inherent problems in current image-synthesis-based self-supervised depth estimation, thereby generating high-quality depth maps even in low-visibility situations and further benefiting accurate estimations of visibility, airlight, and PM2.5 mass concentration. Extensive experiments on the original and synthesized data from the KITTI dataset and real-world data collected in Beijing demonstrate that the proposed method can (1) achieve performance comparable in self-supervised depth estimation as compared with other state-of-the-art methods when taking clear images as input; (2) predict vivid depth map for images contaminated by various levels of haze when the network trained with previous framework fails; and (3) accurately estimate visibility, airlight, and PM2.5 mass concentrations. Beneficial applications can be developed based on the presented work to contribute to high-precise and dynamic geoinformation reconstruction, transportation, meteorology, and smart city.

KITTI Dataset Research Articles

Articles published on KITTI Dataset

PPEA-Depth: Progressive Parameter-Efficient Adaptation for Self-Supervised Monocular Depth Estimation

CTO-SLAM: Contour Tracking for Object-Level Robust 4D SLAM

MSSD: multi-scale self-distillation for object detection

WS-SSD: Achieving faster 3D object detection for autonomous driving via weighted point cloud sampling

Visual Odometry Based on Improved Oriented Features from Accelerated Segment Test and Rotated Binary Robust Independent Elementary Features

An Expository Examination of Temporally Evolving Graph-Based Approaches for the Visual Investigation of Autonomous Driving

DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection

Not all points are balanced: Class balanced single-stage outdoor multi-class 3D object detector from point clouds

A lightweight vehicle detection network fusing feature pyramid and channel attention

A synthetic digital city dataset for robustness and generalisation of depth estimation models

Learnable fusion mechanisms for multimodal object detection in autonomous vehicles

PDTE: Pyramidal deep Taylor expansion for optical flow estimation

Semantics-enhanced discriminative descriptor learning for LiDAR-based place recognition

Rethinking superpixel segmentation from biologically inspired mechanisms

SiLK-SLAM: accurate, robust and versatile visual SLAM with simple learned keypoints

RGB road scene material segmentation

AMENet is a monocular depth estimation network designed for automatic stereoscopic display.

Self-supervised multi-task learning framework for safety and health-oriented road environment surveillance based on connected vehicle visual perception

CSPFormer: A cross-spatial pyramid transformer for visual place recognition

Enhancing pseudo label quality for pedestrian and cyclist in weakly supervised 3D object detection

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

KITTI Dataset Research Articles

Articles published on KITTI Dataset

PPEA-Depth: Progressive Parameter-Efficient Adaptation for Self-Supervised Monocular Depth Estimation

CTO-SLAM: Contour Tracking for Object-Level Robust 4D SLAM

MSSD: multi-scale self-distillation for object detection

WS-SSD: Achieving faster 3D object detection for autonomous driving via weighted point cloud sampling

Visual Odometry Based on Improved Oriented Features from Accelerated Segment Test and Rotated Binary Robust Independent Elementary Features

An Expository Examination of Temporally Evolving Graph-Based Approaches for the Visual Investigation of Autonomous Driving

DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection

Not all points are balanced: Class balanced single-stage outdoor multi-class 3D object detector from point clouds

A lightweight vehicle detection network fusing feature pyramid and channel attention

A synthetic digital city dataset for robustness and generalisation of depth estimation models

Learnable fusion mechanisms for multimodal object detection in autonomous vehicles

PDTE: Pyramidal deep Taylor expansion for optical flow estimation

Semantics-enhanced discriminative descriptor learning for LiDAR-based place recognition

Rethinking superpixel segmentation from biologically inspired mechanisms

SiLK-SLAM: accurate, robust and versatile visual SLAM with simple learned keypoints

RGB road scene material segmentation

AMENet is a monocular depth estimation network designed for automatic stereoscopic display.

Self-supervised multi-task learning framework for safety and health-oriented road environment surveillance based on connected vehicle visual perception

CSPFormer: A cross-spatial pyramid transformer for visual place recognition

Enhancing pseudo label quality for pedestrian and cyclist in weakly supervised 3D object detection