Frame‐by‐frame annotation of video recordings using deep neural networks

Alexander M Conway,Ian N Durbach,Robert N Harris,Alistair Mcinnes

doi:10.1002/ecs2.3384

Alexander M Conway, Ian N Durbach + Show 2 more

Open Access

https://doi.org/10.1002/ecs2.3384

Copy DOI

Abstract

AbstractVideo data are widely collected in ecological studies, but manual annotation is a challenging and time‐consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets and another collected by animal‐borne cameras deployed on African penguins, used to classify behavior. The combined RNN‐CNN led to a relative improvement in test set classification accuracy over an image‐only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behavior classes (12–17%). Image‐only and video models classified seal activity with very similar accuracy (88 and 89%), and no seal visits were missed entirely by either model. Temporal patterns related to movement provide valuable information about animal behavior, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.

Highlights

Technological advances in quality, size, battery life and storage capacity have enabled video cameras to record more data at better quality on a broader variety of animals, becoming small enough to deploy on numerous animal species (Rutz & Troscianko, 2013; Takahashi et al, 2004) and on drones (Anderson & Gaston, 2013; Cruzan et al, 2016), as well as in more conventional fixed locations
Footage captured using video cameras needs to be annotated for use in scientific research, a currently labour intensive process often involving highly trained scientists manually annotating the content of videos frame by frame
Even with dedicated annotation software, this presents a major bottleneck for scientific research based on these data, necessitating the development of computer-assisted approaches (Schneider, Taylor, Linquist, & Kremer, 2019; Weinstein, 2015)

Summary

Introduction

Technological advances in quality, size, battery life and storage capacity have enabled video cameras to record more data at better quality on a broader variety of animals, becoming small enough to deploy on numerous animal species (Rutz & Troscianko, 2013; Takahashi et al, 2004) and on drones (Anderson & Gaston, 2013; Cruzan et al, 2016), as well as in more conventional fixed locations. Video classification is a challenging modelling problem, with the challenges of image classification amplified because the same sources of natural visual variation occur between videos and within videos as objects move around and change poses, scales, illuminations and backgrounds during the course of a single video. The temporal component of video presents significant modeling challenges because it dramatically increases the size of video data but because the relevant visual features required to classify a video can span several frames with no single frame containing enough information on its own. The pixels of an image representing objects are correlated spatially to form visual object features in a single frame but are correlated through time

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ecosphere	Publication Date: Mar 1, 2021
Citations: 10	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Frame‐by‐frame annotation of video recordings using deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecosphere

Lead the way for us

Similar Papers

INTELLIGENT MODEL FOR CLASSIFYING HEMODYNAMIC PATTERNS OF BRAIN ACTIVATION TO IDENTIFY NEUROCOGNITIVE MECHANISMS OF SPATIAL-NUMERICAL ASSOCIATIONS
R G Asadullaev ... M A Sitnikova
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -
R G Asadullaev, et. al.R G Asadullaev ... M A Sitnikova
01 Jan 2024
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -

Progressive Convolutional Recurrent Neural Networks for Speech Enhancement
S China Venkateswarlu ... D Vemana Chary
-
S China Venkateswarlu, et. al.S China Venkateswarlu ... D Vemana Chary
14 Sep 2022
14 Sep 2022

Photovoltaic generation forecasting using convolutional and recurrent neural networks
A Babalhavaeji ... S.A Gonzalez
Energy Reports | VOL. 9
A Babalhavaeji, et. al.A Babalhavaeji ... S.A Gonzalez
28 Sep 2023
Energy Reports | VOL. 9

A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
Chao Sun ... Qin Yu
Scientific Reports | VOL. 11
Chao Sun, et. al.Chao Sun ... Qin Yu
14 Jan 2021
Scientific Reports | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Frame‐by‐frame annotation of video recordings using deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecosphere