Survey of Action Recognition, Spotting, and Spatio-Temporal Localization in Soccer—Current Trends and Research Perspectives

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Analyzing action scenes in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task, divided into action recognition, spotting key moments, and identifying actions in both time and space (spatio-temporal action localization) in soccer. We explore publicly available data sources and metrics used to evaluate models’ performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional approaches. Our analysis begins with methods based on feature engineering, followed by an exploration of various deep learning techniques. This includes using Convolutional Neural Networks (CNNs) for visual information processing, Recurrent Neural Networks (RNNs) for analyzing temporal sequences, and transformer architectures to effectively capture context. In particular, we focus on the specifics of multimodal data, illustrating the potential for improved model accuracy and robustness. This includes an exploration of methods that integrate information from multiple sources, such as video and audio data, and methods that represent a single data source through multiple analytical lenses, offering a richer, more nuanced understanding of soccer actions (e.g., using a graph representation of players). Finally, the article highlights some of the open research questions and future directions in the field of soccer action analysis, especially the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of analyzing action scenes in soccer.

Similar Papers
  • Book Chapter
  • Cite Count Icon 4
  • 10.1016/b978-0-32-385787-1.00019-1
Chapter 14 - Human activity recognition
  • Jan 1, 2022
  • Deep Learning for Robot Perception and Cognition
  • Lukas Hedegaard + 2 more

Chapter 14 - Human activity recognition

  • Research Article
  • Cite Count Icon 1
  • 10.3390/computation13010004
A Hybrid Model for Soybean Yield Prediction Integrating Convolutional Neural Networks, Recurrent Neural Networks, and Graph Convolutional Networks
  • Dec 27, 2024
  • Computation
  • Vikram S Ingole + 5 more

Soybean yield prediction is one of the most critical activities for increasing agricultural productivity and ensuring food security. Traditional models often underestimate yields because of limitations associated with single data sources and simplistic model architectures. These prevent complex, multifaceted factors influencing crop growth and yield from being captured. In this line, this work fuses multi-source data—satellite imagery, weather data, and soil properties—through the approach of multi-modal fusion using Convolutional Neural Networks and Recurrent Neural Networks. While satellite imagery provides information on spatial data regarding crop health, weather data provides temporal insights, and the soil properties provide important fertility information. Fusing these heterogeneous data sources embeds an overall understanding of yield-determining factors in the model, decreasing the RMSE by 15% and improving R2 by 20% over single-source models. We further push the frontier of feature engineering by using Temporal Convolutional Networks (TCNs) and Graph Convolutional Networks (GCNs) to capture time series trends, geographic and topological information, and pest/disease incidence. TCNs can capture long-range temporal dependencies well, while the GCN model has complex spatial relationships and enhanced the features for making yield predictions. This increases the prediction accuracy by 10% and boosts the F1 score for low-yield area identification by 5%. Additionally, we introduce other improved model architectures: a custom UNet with attention mechanisms, Heterogeneous Graph Neural Networks (HGNNs), and Variational Auto-encoders. The attention mechanism enables more effective spatial feature encoding by focusing on critical image regions, while the HGNN captures interaction patterns that are complex between diverse data types. Finally, VAEs can generate robust feature representation. Such state-of-the-art architectures could then achieve an MAE improvement of 12%, while R2 for yield prediction improves by 25%. In this paper, the state of the art in yield prediction has been advanced due to the employment of multi-source data fusion, sophisticated feature engineering, and advanced neural network architectures. This provides a more accurate and reliable soybean yield forecast. Thus, the fusion of Convolutional Neural Networks with Recurrent Neural Networks and Graph Networks enhances the efficiency of the detection process.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.procs.2024.04.132
Designing of VehiNet Using Convolutional Neural Networks and Deep Learning Techniques
  • Jan 1, 2024
  • Procedia Computer Science
  • Mahita Kandala + 6 more

Designing of VehiNet Using Convolutional Neural Networks and Deep Learning Techniques

  • Research Article
  • 10.1051/itmconf/20257403008
Comparative evaluation of deep learning and machine learning techniques for sentiment analysis of electronic product review data
  • Jan 1, 2025
  • ITM Web of Conferences
  • Archana Nagelli + 2 more

The primary thoughts, perceptions, attitudes, feedback, and even emotions expressed by people on social networking and e-commerce sites are the primary focus of sentiment analysis also referred to as opinion mining. It provides meaningful information to various stakeholders and customers in influencing their next move. However, the biggest challenge is the extraction of relevant information from the tremendous data. Machine learning and deep learning techniques have obtained remarkable success in exemplifying and classifying information. Machine learning works with the binary classification of information, whereas deep learning provides automatic feature detection. A study was carried out to extract the relevant information from the Amazon reviews dataset of electronics products. The Naïve Bayes, support vector machine, decision tree, convolution neural network, long short term memory, recursive neural networks, and recurrent neural networks were used on the dataset after applying different data preprocessing. To evaluate the performance of various machine learning and deep learning techniques, frameworks, F1 score, precision, recall as well as, accuracy was used. The results suggest that deep learning techniques have outperformed the machine learning techniques, and RNN shows the highest accuracy among all the techniques.

  • Research Article
  • 10.24014/ijaidm.v7i2.29898
A Hybrid CNN-RNN Model for Enhanced Anemia Diagnosis: A Comparative Study of Machine Learning and Deep Learning Techniques
  • May 23, 2024
  • Indonesian Journal of Artificial Intelligence and Data Mining
  • Gregorius Airlangga

This study proposes a hybrid Convolutional Neural Network-Recurrent Neural Network (CNN-RNN) model for the accurate diagnosis of anemia types, leveraging the strengths of both architectures in capturing spatial and temporal patterns in Complete Blood Count (CBC) data. The research involves the development and evaluation of various models of single-architecture deep learning (DL) models, specifically Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Fully Convolutional Network (FCN). The models are trained and validated using stratified k-fold cross-validation to ensure robust performance. Key metrics such as test accuracy are utilized to provide a comprehensive assessment of each model's performance. The hybrid CNN-RNN model achieved the highest test accuracy of 90.27%, surpassing the CNN (89.88%), FCN (85.60%), MLP (79.77%), and RNN (73.54%) models. The hybrid model also demonstrated superior performance in cross-validation, with an accuracy of 87.31% ± 1.77%. Comparative analysis highlights the hybrid model's advantages over single-architecture DL models, particularly in handling imbalanced data and providing reliable classifications across all anemia types. The results underscore the potential of advanced DL architectures in medical diagnostics and suggest pathways for further refinements, such as incorporating attention mechanisms or additional feature engineering, to enhance model performance. This study contributes to the growing body of knowledge on AI-driven medical diagnostics and presents a viable tool for clinical decision support in anemia diagnosis

  • Research Article
  • Cite Count Icon 1
  • 10.56294/dm2023174
Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition
  • Dec 27, 2023
  • Data and Metadata
  • Ali Benaissa + 3 more

Introduction: this paper explores the effectiveness of character recognition models for document digitization, leveraging diverse machine learning and deep learning techniques. The study, driven by the increasing relevance of image classification in various applications, focuses on evaluating Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and VGG16 with transfer learning. The research employs a challenging French alphabet dataset, comprising 82 classes, to assess the models' capacity to discern intricate patterns and generalize across diverse characters. Objective: This study investigates the effectiveness of character recognition models for document digitization using diverse machine learning and deep learning techniques. Methods: the methodology initiates with data preparation, involving the creation of a merged dataset from distinct sections, encompassing digits, French special characters, symbols, and the French alphabet. The dataset is subsequently partitioned into training, test, and evaluation sets. Each model undergoes meticulous training and evaluation over a specific number of epochs. The recording of fundamental metrics includes accuracy, precision, recall, and F1-score for CNN, RNN, and VGG16, while SVM and KNN are evaluated based on accuracy, macro avg, and weighted avg. Results: the outcomes highlight distinct strengths and areas for improvement across the evaluated models. SVM demonstrates remarkable accuracy of 98,63 %, emphasizing its efficacy in character recognition. KNN exhibits high reliability with an overall accuracy of 97 %, while the RNN model faces challenges in training and generalization. The CNN model excels with an accuracy of 97,268 %, and VGG16 with transfer learning achieves notable enhancements, reaching accuracy rates of 94,83 % on test images and 94,55 % on evaluation images. Conclusion: our study evaluates the performance of five models—Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and VGG16 with transfer learning—on character recognition tasks. SVM and KNN demonstrate high accuracy, while RNN faces challenges in training. CNN excels in image classification, and VGG16, with transfer learning, enhances accuracy significantly. This comparative analysis aids in informed model selection for character recognition applications

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/siu49456.2020.9302448
Direction Finding Using Convolutional Neural Networks and Convolutional Recurrent Neural Networks
  • Oct 5, 2020
  • Fehmi Ayberk Uçkun + 3 more

In this paper, alternative direction finding methods have been proposed using deep learning techniques. Firstly, Regeression and Classification models have created by using Convolutional Neural Networks (CNNs). In the second Convolutional Neural Networks and Recurrent Neural Networks (RNNs) have been utilized in the proposed methods. Despite having vast amount of direction finding studies, utilization of neural networks is scarce in literature and past works mostly only includes usage of CNNs. In this study, direction finding is performed by learning signals reaching multiple antenna arrays by networks. Created neural networks have been fed with different data formats and their performances against noisy and no-noise data have been shown. In addition, comparative analysis of the developed models were made in the similar Signal-to-Noise Ratio (SNR) range with the subspace based MUSIC algorithm, which is frequently used in direction finding.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-981-16-1395-1_10
Classification of Covid-19 Tweets Using Deep Learning Techniques
  • Jan 1, 2021
  • Pramod Sunagar + 5 more

In this digital era, there is an exponential growth of text-based content in the electronic world. Data as texts exist in the form of documents, social media posts on Facebook, Twitter, etc., logs, sensor data, and emails. Twitter is a social platform where users express their views on various aspects in a day to day life. Twitter produces over 500 million tweets daily that is 6000 tweets per second. Twitter data is, by definition, very noisy and unstructured in nature. Text classifications based on the machine learning techniques have problems like poor generalization ability and sparsity dimension explosion. Classifiers based on deep learning techniques are implemented to improve accuracy to overcome shortcomings of machine learning techniques and to avoid feature extraction processes and have high prediction accuracy and strong learning ability. In this work, the classification of tweets is performed on Covid-19 dataset by implementing deep learning techniques namely Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Recurrent Convolution Neural Network (RCNN), Recurrent Neural Network with Long Short Term Memory (RNN+LSTM), and Bidirectional Long Short Term Memory with Attention (BI-LSTM + Attention). The algorithms are implemented using two-word embedding techniques namely Global Vectors for Word Representation (GloVe) and Word2Vec. RNN with Bidirectional LSTM model has performed better than all the classifiers considered. It has classified the text with an accuracy of 93% and above when used with GloVe and Word2Vec.

  • Conference Article
  • Cite Count Icon 1
  • 10.4271/2023-01-0590
Anomaly Detection Using Convolutional Neural Network and Generative Adversarial Network
  • Apr 11, 2023
  • Amritha Mohanan + 4 more

<div class="section abstract"><div class="htmlview paragraph">In the automotive embedded system domain, the measurements from vehicle and Hardware-In-Loop are currently evaluated against the testcases, either manually or via automation scripts. These evaluations are localized; they evaluate a limited number of signals for a particular measurement without considering system-level behavior. This results in defect leakage. This study aims to develop a tool that can notify anomalies at the signal level in a new measurement without referring to the testcases, considering a more significant number of system-level signals, thereby significantly reducing the defect leakage. The tool learns important features and patterns of each maneuver from many historical measurements using deep learning techniques. We tried two CNN (convolution neural network) models. The first one is a specially designed CNN that does this maneuver classification and class-specific feature extraction. The second model we tried is the FCN (Fully Convolutional Network) Classification model. CNN-based architecture can be trained faster than the recurrent neural network (RNN) model because it utilizes features extracted from the input data. A Generative Adversarial Network (GAN) model is used in series with the CNN model to clone each of these maneuvers for predicting the anomalies. During the testing phase, the CNN model maps the test measurement to the most similar maneuver from the list of already learned maneuvers, followed by the GAN model outputting the anomalies, if any. To validate the tool, 12 measurements, each of 3 different maneuvers, were selected from an old and matured function in the brake system. The class-specific feature-based classification model resulted in 33% accuracy. However, with the Fully Convolutional Network Classification model, we got 100% accuracy. We injected anomalies in one CSV file for testing purposes. The anomaly detection module predicted all the anomalies correctly. Our future goal is to implement this model at the deployment level.</div></div>

  • Research Article
  • 10.14489/vkit.2024.01.pp.038-045
INTELLIGENT MODEL FOR CLASSIFYING HEMODYNAMIC PATTERNS OF BRAIN ACTIVATION TO IDENTIFY NEUROCOGNITIVE MECHANISMS OF SPATIAL-NUMERICAL ASSOCIATIONS
  • Jan 1, 2024
  • Vestnik komp'iuternykh i informatsionnykh tekhnologii
  • R G Asadullaev + 1 more

The study presents the results of the development and testing of deep learning neural network architectures, which demonstrate high accuracy rates in classifying neurophysiological data, in particular hemodynamic brain activation patterns obtained by functional near-infrared spectroscopy, during solving mathematical problems on spatial-numerical associations. The analyzed signal represents a multidimensional time series of oxyhemoglobin and deoxyhemoglobin dynamics. Taking the specificity of the fNIRS signal into account, a comparative analysis of 2 types of neural network architectures was carried out: (1) architectures based on recurrent neural networks: recurrent neural network with long short-term memory, recurrent neural network with long short-term memory with fully connected layers, bidirectional recurrent neural network with long short-term memory, convolutional recurrent neural network with long short-term memory; (2) architectures based on convolutional neural networks with 1D convolutions: convolutional neural network, fully convolutional neural network, residual neural network. Trained long short-term memory recurrent neural network architectures showed worse results in accuracy in comparison with 1D convolutional neural network architectures. Residual neural network (model_Resnet) demonstrated the highest accuracy rates in three experimental conditions more than 88% in detecting age-related differences in brain activation during spatial-numerical association tasks considering the individual characteristics of the respondents’ signal.

  • Research Article
  • 10.22060/eej.2021.19826.5409
Deep Learning for Recognition of Digital Modulations: A Detailed Study
  • Sep 26, 2021
  • AUT Journal of Electrical Engineering
  • Mohammadmohsen Jadidi + 1 more

The automatic modulation recognition of the received signal is very attractive in both military and civilian applications. In the recent years, deep learning techniques have received much attention due to their excellent performance in signal, audio, image and video processing. This paper examines the feasibility of using deep learning algorithms on automatic recognition of the received radio signals' modulation schemes. Modulation recognition has been performed on eight digital modulation types with a Signal-to-Noise Ratio (SNR) from -20dB to 20dB. Primarily, a Vanilla Neural Network is used to classify the type of modulation. Afterwards, convolutional Neural Network (CNN) and Recurrent Neural Network are applied for modulation recognition. These neural networks are widely used in image and signal processing applications. This is followed by designing the other architectures, including Densely Connected Neural Network (DenseNet), inception network, Recurrent Neural Network (RNN), Long-Short Term Memory network (LSTM), and Convolutional Long-Short Term Memory Deep Neural Network (CLDNN) for modulation recognition problem, and their results are presented. During this investigation, a basic model is initially considered for each architecture, and the network performance is studied afterwards by adjusting its parameters. The simulation results show that the proposed modified CLDNN model can provide an accuracy of 98% in high SNRs.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.asoc.2023.110430
Service-oriented model-based fault prediction and localization for service compositions testing using deep learning techniques
  • May 18, 2023
  • Applied Soft Computing
  • Roaa Elghondakly + 2 more

Service-oriented model-based fault prediction and localization for service compositions testing using deep learning techniques

  • Research Article
  • 10.52783/jisem.v10i4.9528
A Study of Federated Learning based Speaker Verification System
  • Apr 30, 2025
  • Journal of Information Systems Engineering and Management
  • Kshirod Sarmah

In recent time, speaker verification has gained significant seriousness as a crucial component of biometric authentication systems. Deep learning (DL) techniques have revolutionized speaker verification by enabling systems to automatically learn discriminative features from raw audio signals. However, the effectiveness of DL models heavily relies on the availability of large-scale datasets, which raises privacy concerns associated with centralized data collection. To get rid of these challenges, federated learning (FL) has emerged as a promising approach, allowing collaborative model training across distributed data sources while preserving data privacy. This paper provides a comprehensive review of recent advancements in speaker verification through the integration of deep federated learning (DFL). There are different deep learning techniques namely convolutional neural networks (CNNs), deep neural networks (DNNs) recurrent neural networks (RNNs) and deep belief networks (DBNs) as well as federated averaging algorithms to enhance speaker verification performance. The CNN based federated learning model exhibits the best overall performance with its EER of 2.42% and MinDCF of 0.048 comparing to the performance of others models DNN, RNN and DBN with its EER of 3.45%, 3.64% and 4.18% and MinDCF of 0.0567,0.0670 and 0.0725 respectively.

  • Book Chapter
  • 10.1007/978-981-19-1669-4_37
Progressive Convolutional Recurrent Neural Networks for Speech Enhancement
  • Sep 14, 2022
  • S China Venkateswarlu + 3 more

The progressive technique is a promising methodology to revise network implementations for speech enhancement purposes. Newer architectures such as progressive convolutional neural networks (P-CNN) or progressive residual neural network (P-ResNet) have already proved the true potential of the progressive technique by greatly improving the speech quality and speech intelligibility through denoising and dereverberation. Expanding the technique to the recurrent neural network architecture which is better suited for dealing with audio produces better results. However, using best of both convolutional networks and recurrent networks, by combining the key characteristics of the respective architectures, i.e., using a progressive convolutional recurrent neural network (P-CRNN) produces a highly efficient and highly effective solution which can be deployed in highly resource sensitive hardware with ease. This study delves into the P-CRNN implementation for Speech Enhancement.KeywordsProgressive convolutional neural networksSpeech intelligibilityRecurrent neural network architectureProgressive convolutional recurrent neural networkGeneral adversarial networks

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 84
  • 10.3389/fphys.2024.1344887
Robust human locomotion and localization activity recognition over multisensory.
  • Feb 21, 2024
  • Frontiers in Physiology
  • Danyal Khan + 6 more

Human activity recognition (HAR) plays a pivotal role in various domains, including healthcare, sports, robotics, and security. With the growing popularity of wearable devices, particularly Inertial Measurement Units (IMUs) and Ambient sensors, researchers and engineers have sought to take advantage of these advances to accurately and efficiently detect and classify human activities. This research paper presents an advanced methodology for human activity and localization recognition, utilizing smartphone IMU, Ambient, GPS, and Audio sensor data from two public benchmark datasets: the Opportunity dataset and the Extrasensory dataset. The Opportunity dataset was collected from 12 subjects participating in a range of daily activities, and it captures data from various body-worn and object-associated sensors. The Extrasensory dataset features data from 60 participants, including thousands of data samples from smartphone and smartwatch sensors, labeled with a wide array of human activities. Our study incorporates novel feature extraction techniques for signal, GPS, and audio sensor data. Specifically, for localization, GPS, audio, and IMU sensors are utilized, while IMU and Ambient sensors are employed for locomotion activity recognition. To achieve accurate activity classification, state-of-the-art deep learning techniques, such as convolutional neural networks (CNN) and long short-term memory (LSTM), have been explored. For indoor/outdoor activities, CNNs are applied, while LSTMs are utilized for locomotion activity recognition. The proposed system has been evaluated using the k-fold cross-validation method, achieving accuracy rates of 97% and 89% for locomotion activity over the Opportunity and Extrasensory datasets, respectively, and 96% for indoor/outdoor activity over the Extrasensory dataset. These results highlight the efficiency of our methodology in accurately detecting various human activities, showing its potential for real-world applications. Moreover, the research paper introduces a hybrid system that combines machine learning and deep learning features, enhancing activity recognition performance by leveraging the strengths of both approaches.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.