Video Frames Research Articles

Context. Target recognition is a priority in military affairs. This task is complicated by the fact that it is necessary to recognize moving objects, different terrain and landscape create obstacles for recognition. Combat actions can take place at different times of the day, accordingly, it is necessary to take into account the perspective of lighting and general lighting. It is necessary to detect the object in the video by segmenting the video frames, recognize and classify. Objective of the study is to develop a technology for the analysis of the development of a technology for recognizing targets in real time as a component of the fire control system, due to the use of artificial intelligence, YOLO and machine learning. Method. The article develops a video stream analysis technology for automatic target recognition of the fire control system based on machine learning. The paper proposes the development of a target recognition module as a component of the fire control system within the framework of the proposed information technology using artificial intelligence. The YOLOv8 pattern recognition model family was used to develop the target recognition module. The methods used during the study of the formed dataset. – Bounding Box: Noise – Up to 15% of pixels (limiting frame: adding salt and pepper noise to the image – up to 15% of pixels). – Bounding Box: Blur – Up to 2.5px (bounding box: adding Gaussian blur to the image – up to 2.5 pixels). – Cutout – 3 boxes with 10% size each (cut out a part of the image – 3 boxes of 10% size each). – Brightness Between –25% and +25% (changing the brightness of the image to increase the resistance of the model to changes in lighting and camera settings – from –25% to +25%). – Rotation – Between –15 and +15 (rotation of the image object – clockwise or counterclockwise by degrees from –15 to +15). – Flip – Horizontal (flip the image object horizontally). Results. The data is collected from open sources, in particular, from videos posted in open sources on the YouTube platform. The main task of data preprocessing is the classification of three classes of objects on video or in real time – APC, BMP and TANK. The dataset is formed using the Roboflow platform based on the labeling tools and subsequently the augmentation tools. The dataset consists of 1193 unique images – approximately equally for each class. The training was conducted using Google Colab resources. It took 100 epochs to train the model. Conclusions. Analysis is performed according to mAP50 (average precision as 0.85), mAP50-95 (0.6), precision (0.89) and recall (0.75). Big losses are due to the fact that the background was not taken into account during the research – training the module on the basis of confirmed data (images) of the background without technology

Read full abstract

Situation recognition is an crucial problem in scene understanding, activity understanding, and action reasoning as it provides a structured representation of the main activity depicted in the image.Semantic role labeling is crucial to situation recognition, which is challenging because a single action can have multiple meanings and purposes depending on its context. Understanding images beyond the highlighted actions requires inferences about the context of the scene, the objects, and their role in the captured event. Recently, situation recognition (SR) has been introduced, which jointly derives a collection of the action (activity), meaning-role, and noun (entities) pairs in the form of moving images. To label these frames as action frames, we must assign nouns (entities) to the role based on the content of the observed image. One of the main challenges is managing the complex dependencies between the assigned roles (nouns) and the predicted action, as the correct role assignment often depends on the accuracy of the action prediction. We introduce, RoadSitu, a road situation recognition that involves generating a structured summary of what is happening in a road scenario using an action and the semantic roles played by agents from a video frame. The action can describe a diverse set of situations, and the same agent can play various roles depending on the situation depicted in the video frame. Therefore, a situation recognition model needs to understand the context of each video frame and the visual-linguistic meaning of the semantic roles of that particular frame. One of the main challenges in this work is the complex task of annotating video frames with semantic roles and handling the structured dependencies between the assigned roles (nouns) and the predicted action (activity). Additionally, the sparsity of meaningful semantic information within road scenarios poses further difficulties. To overcome these challenges, we introduce a novel approach where action recognition and noun estimation work together interactively to form structured summaries of each situation. In experiments using a road video dataset obtained from a South Korean company, RoadSitu achieved significant improvements across various performance metrics, with a Top-1 verb accuracy of 43.46%, Top-5 verb accuracy of 72.48%, and value accuracy of 34.21%, outperforming baseline models such as GSRTR and JSL by 2.4% and 3.86% in Top-1 verb accuracy, respectively. These results demonstrate the effectiveness of our model in handling complex road scenarios.

Read full abstract

Video Frames Research Articles

Related Topics

Articles published on Video Frames

Motion-Aware Dynamic Graph Neural Network for Video Compressive Sensing.

Detection and identification of un-uniformed shape text from blurred video frames

Framework for abnormal event detection and tracking based on effective sparse factorization strategy

Internet of things assisted deep learning enabled driver drowsiness monitoring and alert system using CNN-LSTM framework

Reduction of Vision-Based Models for Fall Detection

STA-net: a deblurring network combined with spatiotemporal information for zinc froth flotation

Glottic opening detection using deep learning for neonatal intubation with video laryngoscopy.

An innovative traffic flow detection model based on temporal video frame analysis and grayscale aggregation quantification

Facial Movements Extracted from Video for the Kinematic Classification of Speech

Noise & mottle suppression methods for cumulative Cherenkov images of radiation therapy delivery

Enhancing Video Anomaly Detection with Improved UNET and Cascade Sliding Window Technique

Take good care of your fish: fish re-identification with synchronized multi-view camera system

Computer-Vision-Aided Deflection Influences Line Identification of Concrete Bridge Enhanced by Edge Detection and Time-Domain Forward Inference

ACORN+: Adaptive Compression-Reconstruction for Device-Cloud Collaboration Video Services

Nose-Driven Cursor Control: An Assistive System for Disabled Individuals Using OCRM Similarity Tracking

Vapor Channel Oscillations in Laser Lithotripsy.

INTELLIGENT VIDEO ANALYSIS TECHNOLOGY FOR AUTOMATIC FIRE CONTROL TARGET RECOGNITION BASED ON MACHINE LEARNING

RoadSitu: Leveraging Road Video Frame Extraction and Three-Stage Transformers for Situation Recognition

Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders

DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Video Frames Research Articles

Related Topics

Articles published on Video Frames

Motion-Aware Dynamic Graph Neural Network for Video Compressive Sensing.

Detection and identification of un-uniformed shape text from blurred video frames

Framework for abnormal event detection and tracking based on effective sparse factorization strategy

Internet of things assisted deep learning enabled driver drowsiness monitoring and alert system using CNN-LSTM framework

Reduction of Vision-Based Models for Fall Detection

STA-net: a deblurring network combined with spatiotemporal information for zinc froth flotation

Glottic opening detection using deep learning for neonatal intubation with video laryngoscopy.

An innovative traffic flow detection model based on temporal video frame analysis and grayscale aggregation quantification

Facial Movements Extracted from Video for the Kinematic Classification of Speech

Noise & mottle suppression methods for cumulative Cherenkov images of radiation therapy delivery

Enhancing Video Anomaly Detection with Improved UNET and Cascade Sliding Window Technique

Take good care of your fish: fish re-identification with synchronized multi-view camera system

Computer-Vision-Aided Deflection Influences Line Identification of Concrete Bridge Enhanced by Edge Detection and Time-Domain Forward Inference

ACORN+: Adaptive Compression-Reconstruction for Device-Cloud Collaboration Video Services

Nose-Driven Cursor Control: An Assistive System for Disabled Individuals Using OCRM Similarity Tracking

Vapor Channel Oscillations in Laser Lithotripsy.

INTELLIGENT VIDEO ANALYSIS TECHNOLOGY FOR AUTOMATIC FIRE CONTROL TARGET RECOGNITION BASED ON MACHINE LEARNING

RoadSitu: Leveraging Road Video Frame Extraction and Three-Stage Transformers for Situation Recognition

Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders

DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition