Timed-image based deep learning for action recognition in video sequences

Abdourrahmane Mahamane Atto,Alexandre Benoit,Patrick Lambert

doi:10.1016/j.patcog.2020.107353

Abdourrahmane Mahamane Atto, Alexandre Benoit + Show 1 more

Open Access

https://doi.org/10.1016/j.patcog.2020.107353

Copy DOI

Journal: Pattern Recognition	Publication Date: Apr 3, 2020
Citations: 21	License type: publisher-specific-oa

Affiliation: Université Savoie Mont Blanc

Abstract

The paper addresses two issues relative to machine learning on 2D + X data volumes, where 2D refers to image observation and X denotes a variable that can be associated with time, depth, wavelength, etc. The first issue addressed is conditioning these structured volumes for compatibility with respect to convolutional neural networks operating on 2D image file formats. The second issue is associated with sensitive action detection in the “2D + Time” case (video clips and image time series). For the data conditioning issue, the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon tight frames of convolutional networks. As a consequence of this compressibility, the paper proposes converting the 2D + X data volume into a single meta-image file format, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format. For the sensitive action recognition issue, the paper provides: (i) a 3 category video database involving non-violent, moderate and extreme violence actions; (ii) the conversion of this database into a timed meta-image database from the 2D + Time to 2D conditioning stage described above and (iii) outstanding 2-level and 3-level violence classification results from deep convolutional neural networks operating on meta-image databases.

Full Text