The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Sandra Aigner,Marco Körner

doi:10.3390/make2020006

Abstract

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset—KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.

Highlights

The ability to predict possible future actions of traffic participants is essential for anticipatory driving
In contrast to the traditional metrics, which directly compare the pixel values of two images, the Fréchet inception distance (FID) and the learned perceptual image patch similarity (LPIPS) values measure the distance between two images not in pixel-space, but feature-space
We have shown that an intelligently designed loss function is essential for a prediction model to generate plausible frames of traffic scenes

Summary

Introduction

The ability to predict possible future actions of traffic participants is essential for anticipatory driving. Predictions of probable future events can prove beneficial when used as additional inputs to the system. They can help to plan the action more efficiently and to make decisions more informedly. The learned features of an ideal network for video prediction match both of the following criteria They are generic enough to enable the model to generalize well over a variety of different scene contents. They produce high-quality predictions that preserve details of the observed input scene across multiple prediction steps

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning and Knowledge Extraction	Publication Date: Apr 9, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction

Lead the way for us

Similar Papers

Current State of Deepfake Detection and Generation: A Review
Ruby Chauhan ... Ashutosh Sharma
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 17
Ruby Chauhan, et. al.Ruby Chauhan ... Ashutosh Sharma
23 Jan 2024
Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) | VOL. 17

Development of an End-to-End Deep Learning Framework for Sign Language Recognition, Translation, and Video Generation
B Natarajan ... E Rajalakshmi
IEEE Access | VOL. 10
B Natarajan, et. al.B Natarajan ... E Rajalakshmi
01 Jan 2021
IEEE Access | VOL. 10

Redundancy and Attention in Convolutional LSTM for Gesture Recognition.
Guangming Zhu ... Mohammed Bennamoun
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31
Guangming Zhu, et. al.Guangming Zhu ... Mohammed Bennamoun
28 Jun 2019
IEEE Transactions on Neural Networks and Learning Systems | VOL. 31

Improving Generation and Evaluation of Long Image Sequences for Embryo Development Prediction
Pedro Celard ... José Manuel Sorribes-Fdez
Electronics | VOL. 13
Pedro Celard, et. al.Pedro Celard ... José Manuel Sorribes-Fdez
23 Jan 2024
Electronics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction