Aggressive Human Behavior is one of the most sophisticated concepts in social and situational contexts. A visible increase in aggressive human behavior can be seen over the news channels or in our surroundings every day. This work focuses on physical aggression in humans which includes hitting, kicking, punching etc. These aggressive events pose a direct threat to the public safety. Systems that can automatically monitor or detect surveillance videos and thus identify aggressive human activities in those videos will be of great help to the authorities. In this research, we used three different datasets i.e. hockey fights, movies and violent flow dataset. The video clips from these datasets are converted into pre-processed frame data sequences. The datasets are then divided into training dataset and validation dataset. The model through which training dataset is passed contains Convolution Neural Network (CNN) linked to convolutional LSTM (ConvLSTM) layer. The output of this model is binary classification of aggressive and non-aggressive flags. Further, the validation dataset is used to test the model efficiency. In case, the performance of the model is not satisfactory, the training of the model is re-evaluated until we achieve the desired performance. As depicted by the results, ResNet50 is the best performing CNN model with accuracy of 90%. The InceptionV3 CNN model yielded 89% of accuracy which is close to ResNet50. Further, VGG19 yielded very poor performance results of only 79% of accuracy. For future works, it is suggested to expand the model to more complex violence scenarios and appliances. Find creative solutions for data collection, advance generalization techniques and real-time optimizations.
Read full abstract