End-to-End Autonomous Driving Without Costly Modularization and 3D Manual Annotation.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

We propose UAD, an end-to-end framework with Unsupervised pretext task for vision-based Autonomous Driving, achieving the best open-loop evaluation performance in nuScenes, meanwhile showing robust closed-loop driving quality in CARLA. Our motivation stems from the observation that current end-to-end autonomous driving (E2EAD) models still mimic the modular architecture in typical driving stacks, with carefully designed supervised perception and prediction subtasks to provide environment information for oriented planning. Although achieving groundbreaking progress, such design has certain drawbacks: 1) preceding subtasks require massive high-quality 3D annotations as supervision, posing a significant impediment to scaling the training data; and 2) each submodule entails substantial computation overhead in both training and inference. To this end, we propose UAD, an E2EAD framework with an unsupervised1 proxy to address all these issues. Firstly, we design a novel Angular Perception Pretext to eliminate the annotation requirement. The pretext perceives the driving scene by predicting the angular-wise spatial objectness and temporal dynamics, without manual annotation. Secondly, a self-supervised training strategy, which learns the consistency of the predicted trajectories under different augment views, is proposed to enhance the planning robustness in steering scenarios. Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate of nuScenes open-loop evaluation and obtains the route completion score of 98.5% in closed-loop evaluation of CARLA's Town05 Long benchmark, which outperforms the recent work VADv2. Moreover, the proposed method consumes only 44.3% training resources of UniAD and runs $3.4\times$3.4× faster in inference when employing the same backbone network. Our innovative design not only for the first time demonstrates unarguable performance advantages over supervised counterparts, but also enjoys unprecedented efficiency in data, training, and inference.

Similar Papers
  • Research Article
  • Cite Count Icon 40
  • 10.1002/hbm.24202
Spatio-temporal dynamics of resting-state brain networks improve single-subject prediction of schizophrenia diagnosis.
  • May 10, 2018
  • Human Brain Mapping
  • Akhil Kottaram + 5 more

Correlation in functional MRI activity between spatially separated brain regions can fluctuate dynamically when an individual is at rest. These dynamics are typically characterized temporally by measuring fluctuations in functional connectivity between brain regions that remain fixed in space over time. Here, dynamics in functional connectivity were characterized in both time and space. Temporal dynamics were mapped with sliding-window correlation, while spatial dynamics were characterized by enabling network regions to vary in size (shrink/grow) over time according to the functional connectivity profile of their constituent voxels. These temporal and spatial dynamics were evaluated as biomarkers to distinguish schizophrenia patients from controls, and compared to current biomarkers based on static measures of resting-state functional connectivity. Support vector machine classifiers were trained using: (a) static, (b) dynamic in time, (c) dynamic in space, and (d) dynamic in time and space characterizations of functional connectivity within canonical resting-state brain networks. Classifiers trained on functional connectivity dynamics mapped over both space and time predicted diagnostic status with accuracy exceeding 91%, whereas utilizing only spatial or temporal dynamics alone yielded lower classification accuracies. Static measures of functional connectivity yielded the lowest accuracy (79.5%). Compared to healthy comparison individuals, schizophrenia patients generally exhibited functional connectivity that was reduced in strength and more variable. Robustness was established with replication in an independent dataset. The utility of biomarkers based on temporal and spatial functional connectivity dynamics suggests that resting-state dynamics are not trivially attributable to sampling variability and head motion.

  • Research Article
  • Cite Count Icon 620
  • 10.1609/aaai.v33i01.33015668
Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction
  • Jul 17, 2019
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Huaxiu Yao + 4 more

Traffic prediction has drawn increasing attention in AI research field due to the increasing availability of large-scale traffic data and its importance in the real world. For example, an accurate taxi demand prediction can assist taxi companies in pre-allocating taxis. The key challenge of traffic prediction lies in how to model the complex spatial dependencies and temporal dynamics. Although both factors have been considered in modeling, existing works make strong assumptions about spatial dependence and temporal dynamics, i.e., spatial dependence is stationary in time, and temporal dynamics is strictly periodical. However, in practice the spatial dependence could be dynamic (i.e., changing from time to time), and the temporal dynamics could have some perturbation from one period to another period. In this paper, we make two important observations: (1) the spatial dependencies between locations are dynamic; and (2) the temporal dependency follows daily and weekly pattern but it is not strictly periodic for its dynamic temporal shifting. To address these two issues, we propose a novel Spatial-Temporal Dynamic Network (STDN), in which a flow gating mechanism is introduced to learn the dynamic similarity between locations, and a periodically shifted attention mechanism is designed to handle long-term periodic temporal shifting. To the best of our knowledge, this is the first work that tackle both issues in a unified framework. Our experimental results on real-world traffic datasets verify the effectiveness of the proposed method.

  • Research Article
  • Cite Count Icon 77
  • 10.1016/j.future.2021.07.012
STGNN-TTE: Travel time estimation via spatial–temporal graph neural network
  • Jul 18, 2021
  • Future Generation Computer Systems
  • Guangyin Jin + 4 more

STGNN-TTE: Travel time estimation via spatial–temporal graph neural network

  • Research Article
  • 10.1016/j.eclinm.2025.103298
Development and validation of an MRI spatiotemporal interaction model for early noninvasive prediction of neoadjuvant chemotherapy response in breast cancer: a multicentre study
  • Jun 12, 2025
  • eClinicalMedicine
  • Wenjie Tang + 21 more

Development and validation of an MRI spatiotemporal interaction model for early noninvasive prediction of neoadjuvant chemotherapy response in breast cancer: a multicentre study

  • Research Article
  • Cite Count Icon 66
  • 10.1080/01431161.2015.1083633
A generalization of spatial and temporal fusion methods for remotely sensed surface parameters
  • Sep 2, 2015
  • International Journal of Remote Sensing
  • Hankui K Zhang + 4 more

Remotely sensed surface parameters, such as vegetation index, leaf area index, surface temperature, and evapotranspiration, show diverse spatial scales and temporal dynamics. Generally the spatial and temporal resolutions of remote-sensing data should match the characteristics of surface parameters under observation. These requirements sometimes cannot be provided by a single sensor due to the trade-off between spatial and temporal resolutions. Many spatial and temporal fusion (STF) methods have been proposed to derive the required data. However, the methodology suffers from disorderly development. To better inform future research, this study generalizes the existing methods from around 100 studies as spatial or temporal categories based on their physical assumptions related to spatial scales and temporal dynamics. To be specific, the assumptions are related to the scale invariance of the temporal information and temporal constancy of the spatial information. The spatial information can be contexture or spatial details. Experiments are conducted using Landsat data acquired on 13 dates in two study areas and simulated Moderate Resolution Imaging Spectroradiometer (MODIS) data. The results are presented to demonstrate the typical methods from each category. This study concludes the following. (1) Contexture methods depend heavily on how components maps (contexture) are defined. They are not recommended except when components maps can be estimated properly from observed images. (2) The spatial and temporal adaptive reflectance fusion model (STARFM) and enhanced STARFM (ESTARFM) methods belong to the temporal and spatial categories, respectively. Thus, STARFM and ESTARFM should be better applied to temporal variance – dominated and spatial variance – -dominated areas, respectively. (3) Non-linear methods, such as the sparse representation-based spatio-temporal reflectance fusion model, can successfully address land-cover changes in addition to phonological changes, thereby providing a promising option for STF problems in the future.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1371/journal.pone.0188205
Community ecology in 3D: Tensor decomposition reveals spatio-temporal dynamics of large ecological communities.
  • Nov 14, 2017
  • PLOS ONE
  • Romain Frelat + 8 more

Understanding spatio-temporal dynamics of biotic communities containing large numbers of species is crucial to guide ecosystem management and conservation efforts. However, traditional approaches usually focus on studying community dynamics either in space or in time, often failing to fully account for interlinked spatio-temporal changes. In this study, we demonstrate and promote the use of tensor decomposition for disentangling spatio-temporal community dynamics in long-term monitoring data. Tensor decomposition builds on traditional multivariate statistics (e.g. Principal Component Analysis) but extends it to multiple dimensions. This extension allows for the synchronized study of multiple ecological variables measured repeatedly in time and space. We applied this comprehensive approach to explore the spatio-temporal dynamics of 65 demersal fish species in the North Sea, a marine ecosystem strongly altered by human activities and climate change. Our case study demonstrates how tensor decomposition can successfully (i) characterize the main spatio-temporal patterns and trends in species abundances, (ii) identify sub-communities of species that share similar spatial distribution and temporal dynamics, and (iii) reveal external drivers of change. Our results revealed a strong spatial structure in fish assemblages persistent over time and linked to differences in depth, primary production and seasonality. Furthermore, we simultaneously characterized important temporal distribution changes related to the low frequency temperature variability inherent in the Atlantic Multidecadal Oscillation. Finally, we identified six major sub-communities composed of species sharing similar spatial distribution patterns and temporal dynamics. Our case study demonstrates the application and benefits of using tensor decomposition for studying complex community data sets usually derived from large-scale monitoring programs.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.cagd.2021.101965
SCN: Dilated silhouette convolutional network for video action recognition
  • Feb 1, 2021
  • Computer Aided Geometric Design
  • Michelle Hua + 2 more

SCN: Dilated silhouette convolutional network for video action recognition

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.agrformet.2019.107618
Spatial complexity and temporal dynamics in viticulture: A review of climate-driven scales
  • Jun 13, 2019
  • Agricultural and Forest Meteorology
  • Etienne Neethling + 3 more

Spatial complexity and temporal dynamics in viticulture: A review of climate-driven scales

  • PDF Download Icon
  • Addendum
  • Cite Count Icon 2
  • 10.1371/journal.pone.0196353
Correction: Community ecology in 3D: Tensor decomposition reveals spatio-temporal dynamics of large ecological communities
  • Apr 19, 2018
  • PLoS ONE
  • Romain Frelat + 8 more

[This corrects the article DOI: 10.1371/journal.pone.0188205.].

  • Research Article
  • Cite Count Icon 2
  • 10.3390/s25010191
Knowledge Distillation-Enhanced Behavior Transformer for Decision-Making of Autonomous Driving.
  • Jan 1, 2025
  • Sensors (Basel, Switzerland)
  • Rui Zhao + 6 more

Autonomous driving has demonstrated impressive driving capabilities, with behavior decision-making playing a crucial role as a bridge between perception and control. Imitation Learning (IL) and Reinforcement Learning (RL) have introduced innovative approaches to behavior decision-making in autonomous driving, but challenges remain. On one hand, RL's policy networks often lack sufficient reasoning ability to make optimal decisions in highly complex and stochastic environments. On the other hand, the complexity of these environments leads to low sample efficiency in RL, making it difficult to efficiently learn driving policies. To address these challenges, we propose an innovative Knowledge Distillation-Enhanced Behavior Transformer (KD-BeT) framework. Building on the successful application of Transformers in large language models, we introduce the Behavior Transformer as the policy network in RL, using observation-action history as input for sequential decision-making, thereby leveraging the Transformer's contextual reasoning capabilities. Using a teacher-student paradigm, we first train a small-capacity teacher model quickly and accurately through IL, then apply knowledge distillation to accelerate RL's training efficiency and performance. Simulation results demonstrate that KD-BeT maintains fast convergence and high asymptotic performance during training. In the CARLA NoCrash benchmark tests, KD-BeT outperforms other state-of-the-art methods in terms of traffic efficiency and driving safety, offering a novel solution for addressing real-world autonomous driving tasks.

  • Conference Article
  • Cite Count Icon 13
  • 10.23919/chicc.2018.8482790
A Deep Reinforcement Learning Algorithm with Expert Demonstrations and Supervised Loss and its application in Autonomous Driving
  • Jul 1, 2018
  • Kai Liu + 2 more

In this paper, we propose a deep reinforcement learning(DRL) algorithm which combines Deep Deterministic Policy Gradient (DDPG) with expert demonstrations and supervised loss for decision making for autonomous driving. Training DRL agent with supervised learning is adopted to accelerate the exploration process and increase the stability. A supervised loss function is introduced in the algorithm to update the actor networks. In addition, reward construction is combined to make the training process more stable and efficient. The proposed algorithm is applied to a popular autonomous driving simulator called TORCS. The experimental results show that the training efficiency and stability are improved by utilizing our algorithm in autonomous driving.

  • Research Article
  • Cite Count Icon 42
  • 10.1109/tvt.2020.2991584
Driving Maneuvers Prediction Based Autonomous Driving Control by Deep Monte Carlo Tree Search
  • Jul 1, 2020
  • IEEE Transactions on Vehicular Technology
  • Jienan Chen + 4 more

Autonomous driving has attracted significant attention in recent years. With the booming of artificial intelligence (AI), deep learning technologies have been applied to autonomous driving to help vehicles better perceive the environment. Besides the perceiving environment, predictive driving is another prominent smooth control and safe driving skill for human drivers. In this work, we develop a deep Monte Carlo Tree Search (deep-MCTS) control method for vision-based autonomous driving. Compared with existing deep learning-based autonomous driving control methods, our method can predict driving maneuvers to help improve the stability and performance of driving control. Two deep neural networks (DNNs) are employed for predicting action-state transformation and obtaining action-selection probabilities, respectively. The deep-MCTS utilizes the predicted information of the two DNNs and reconstructs multiple possible trajectories to predict driving maneuvers. An optimal trajectory is selected by the deep-MCTS based on both current road conditions and predicted driving maneuvers. The proposed method achieves high control stability by avoiding sharp turns and driving deviations. We implement our algorithm in the Udacity and Torcs self-driving environments. The testing results show that our algorithm achieves a significant improvement in training efficiency, the stability of steering control, and stability of driving trajectory compared to existing methods.

  • Research Article
  • 10.54254/2755-2721/2025.tj23323
Reinforcement Learning Methods for Autonomous Driving: A Survey
  • May 22, 2025
  • Applied and Computational Engineering
  • Yujie Lin

In recent years, with the rapid development of intelligent transportation, Reinforcement Learning (RL), as an adaptive decision-making method, has gradually permeated into various levels of Autonomous Driving (AD). Therefore, this paper reviews the latest advances in the application of RL in AD. In terms of high-level decision-making and behavioral planning, RL, combined with visual-language models, imitation learning, multi-stage training, and autoregressive trajectory planning, systematically improves planning accuracy and task success rates. At the motion control level, the synergistic optimization of deep reinforcement learning (DRL) based continuous control strategies and robust control methods enhances performance in path tracking, dynamic obstacle avoidance, and multi-sensor information fusion. Meanwhile, end-to-end autonomous driving leverages novel frameworks such as closed-loop RL, World Model (WM), and multimodal decision fusion, effectively narrowing the gap between simulation and real-world environments while achieving significant improvements in safety and smoothness. Additionally, the paper discusses the limitations of RL applications, including data dependency, training efficiency, safety, and interpretability. Furthermore, it explores the future prospects for achieving more intelligent autonomous driving systems through strategies such as meta-learning, transfer learning, adversarial training, and human-machine collaboration.

  • Research Article
  • Cite Count Icon 4
  • 10.1139/b97-157
Aquatic hyphomycetes in three rivers of southwestern France. III. Relationships between communities spatial and temporal dynamics
  • Jan 1, 1998
  • Canadian Journal of Botany
  • Eric Fabre

Spatial and temporal dynamics of individual species of aquatic hyphomycetes in three southwestern French rivers have been previously studied using water filtration techniques. In this paper, relationships between these spatial and temporal dynamics were explored using correspondence analysis. Correspondence analyses were performed for conidial concentration and presence-absence of species. The analysis of conidial concentration data indicated that species temporal dynamics is more important in determining changes in conidial communities than species spatial dynamics. However, the interconnectedness of these dynamics was revealed by a Guttman effect between the first two factorial axes. A linear gradient Tech-Adour-Nive, which corresponds to the geographical disposition of these rivers in southwestern France, was observed on the first three axes of the correspondence analysis of conidial concentrations. This gradient did not exist in the correspondence analysis of species presence-absence, but the analysis revealed a qualitative difference of the communities between the summer season and the beginning of autumn. The comparison of eigenvalues for the two correspondence analyses pointed out that conidial abundance is more significant than presence-absence of species for the structure of the data table. Key words: aquatic hyphomycete, spatial dynamic, temporal dynamic, climatic gradient.

  • Research Article
  • Cite Count Icon 39
  • 10.1046/j.1469-8137.1998.00154.x
Effect of physical conditions on the spatial and temporal dynamics of the soil‐borne fungal pathogen Rhizoctonia solani
  • Apr 1, 1998
  • New Phytologist
  • W Otten + 1 more

The transmission of infection by many soil‐borne fungal parasites of plants depends on the ability of the fungus to grow on or through soil. Progress in analysing the effects of soil physical factors on the temporal and spatial dynamics of fungal growth has been hindered by technical difficulties of quantifying fungal biomass in soil and heterogeneity in soil properties. In this paper we use a combination of a monoclonal antibody‐based immunosorbent assay and microscopy to analyse the effects of soil physical properties on the spatial and temporal dynamics of colonies of the economically important fungus Rhizoctonia solani Kühn growing in two dimensions and three dimensions in a sand. Combinations of different particle‐size distributions and matric potential are used to manipulate the air‐filled pore volume and pore‐size distribution independently of each other. Temporal dynamics are measured by the change in fungal biomass over time whereas spatial dynamics relate to fungal spread and are measured by the colony size, the rate of colony expansion and the biomass distribution within colonies. We show that the fungus spreads more than three times further over surfaces than through sand, even though the same amount of biomass is produced in each case. Pore‐size distribution and air‐filled pore space both affected the extent and rate of fungal spread in three dimensions within sand, with more rapid and extensive spread in a coarse sand compared with a fine sand at identical air‐filled pore volume. The spread of fungal hyphae along surfaces was affected neither by differences in surface texture nor by air‐filled volume, and was substantially more homogeneous than for three‐dimensional spread. We argue that the relative impermeability of sand surfaces to penetration by hyphae might be influenced by the ability of the fungus to branch within a confined space rather than simply to penetrate the pores. The broader epidemiological and ecological consequences of preferential spread by parasitic and saprophytic fungi along surfaces rather than through the dense soil volume are discussed.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.