Применение нейросетевых алгоритмов для детектирования человека на видеоряде в шахте

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The problem of using neural network algorithms to detect a person in a video sequence in a mine is considered. Convolutional neural networks are analyzed: Faster R-CNN, YOLOv5 and YOLOv8 with n, m, x (Nano, Medium and Extra Large) and SSG assemblies for detecting objects in video with classes: miner, face, head with a helmet, helmet.

Similar Papers
  • Research Article
  • Cite Count Icon 23
  • 10.1038/s41598-021-97195-6
Research on improved convolutional wavelet neural network
  • Sep 9, 2021
  • Scientific Reports
  • Jingwei Liu + 4 more

Artificial neural networks (ANN) which include deep learning neural networks (DNN) have problems such as the local minimal problem of Back propagation neural network (BPNN), the unstable problem of Radial basis function neural network (RBFNN) and the limited maximum precision problem of Convolutional neural network (CNN). Performance (training speed, precision, etc.) of BPNN, RBFNN and CNN are expected to be improved. Main works are as follows: Firstly, based on existing BPNN and RBFNN, Wavelet neural network (WNN) is implemented in order to get better performance for further improving CNN. WNN adopts the network structure of BPNN in order to get faster training speed. WNN adopts the wavelet function as an activation function, whose form is similar to the radial basis function of RBFNN, in order to solve the local minimum problem. Secondly, WNN-based Convolutional wavelet neural network (CWNN) method is proposed, in which the fully connected layers (FCL) of CNN is replaced by WNN. Thirdly, comparative simulations based on MNIST and CIFAR-10 datasets among the discussed methods of BPNN, RBFNN, CNN and CWNN are implemented and analyzed. Fourthly, the wavelet-based Convolutional Neural Network (WCNN) is proposed, where the wavelet transformation is adopted as the activation function in Convolutional Pool Neural Network (CPNN) of CNN. Fifthly, simulations based on CWNN are implemented and analyzed on the MNIST dataset. Effects are as follows: Firstly, WNN can solve the problems of BPNN and RBFNN and have better performance. Secondly, the proposed CWNN can reduce the mean square error and the error rate of CNN, which means CWNN has better maximum precision than CNN. Thirdly, the proposed WCNN can reduce the mean square error and the error rate of CWNN, which means WCNN has better maximum precision than CWNN.

  • Research Article
  • Cite Count Icon 1
  • 10.30970/vam.2018.26.9837
A DROPOUT TECHNIQUE STUDY FOR THE FASTER R-CNN DETECTORS WITH PRETRAINED CONVOLUTIONAL NEURAL NETWORKS FOR DETECTING VERY SIMPLE OBJECTS THAT CAN BE MASKED
  • Jan 1, 2018
  • Application Mathematics and Informatics
  • V Romanuke

One of the best object detection methods, the Faster R-CNN, uses a pretrained convolutional neural network allowing to train the detector on small training sets typical in the object detection practice. Convolutional networks are prevented from overfitting by inserting DropOut layers. An open question is whether the DropOut technique improves much the object Faster R-CNN detector accuracy. Therefore, the goal is to show how the DropOut technique influences on the object detector performance. An original image classification dataset for pretraining a convolutional neural network is CIFAR-10. An appropriate convolutional network architecture for classifying CIFAR-10 images has a 50 % DropOut layer inserted in-between two fully-connected layers. Object detection tasks used for training and testing the Faster R-CNN detector are of monochrome images wherein small black rectangles are to be detected. Despite such objects are very simple, they can be masked around some dark localities so that detection would not be easy. One detection task is to detect the black rectangles in suburb house frontal views. Another one is to detect the rectangles in office room views. The suburb view dataset is divided into a training set of 120 images and a testing set of 121 images, every entry with a black rectangle. The office view dataset is divided likewise into a training set of 115 images and a testing set of 100 images, every entry with a black rectangle. Performance of the detector is studied against three training parameters: bounding box overlap ratio for positive training samples, minimum anchor box size, and anchor box pyramid scale factor. The performance is meant by the number of detected objects along with the intersection-over-union. However, neither graphs for the summed intersection-over-unions, nor graphs for the number of detected objects show that the DropOut technique influences on the Faster R-CNN object detector performance. Even for letting miss a few objects and decreasing an accuracy threshold, this influence is not significant. Therefore, a pretrained convolutional neural network to be included into the Faster R-CNN object detector should not contain a DropOut layer, especially if the network is trained much longer with the DropOut layer. Key words: object detection, Faster R-CNN object detector, pretrained convolutional network, DropOut, monochrome image, training set, intersection-over-union, number of detected objects.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-71607-7_33
Scalable Object Detection Using Deep but Lightweight CNN with Features Fusion
  • Jan 1, 2017
  • Qiaosong Chen + 6 more

Recently, deep Convolutional Neural Network (CNN) is becoming more and more popular in pattern recognition, and have achieved impressive performance in multi-category datasets. Most object detection system include three main parts, CNN features extraction, region proposal and ROI classification, just like Fast R-CNN and Faster R-CNN. In this paper, a deep but lightweight CNN with features fusion is presented, and our work is focused on the improvement of the features extraction part in Faster R-CNN framework. Inspired by recent technical innovation structures, such as Inception, HyperNet and multi-scale construction, the proposed network is able to result in lower computation consumption with considerable deep layers. Besides, the network is trained with the help of data augmentation, fine-tune and batch normalization. In order to apply scalable with features fusion, there are different sampling methods for different layers, and various size kernel to extract both global and local features. Then fuse these features together, which can deal with diverse size object. The experimental results shows that our method have achieved better performance than Faster R-CNN with VGG16 on VOC2007, VOC2012 and KITTI datasets while maintaining the original speed.

  • Research Article
  • Cite Count Icon 2
  • 10.34229/2707-451x.21.3.6
Comparative Analysis of the Application of Multilayer and Convolutional Neural Networks for Recognition of Handwritten Letters of the Azerbaijani Alphabet
  • Sep 30, 2021
  • Cybernetics and Computer Technologies
  • Elshan Mustafayev + 1 more

Introduction. The implementation of information technologies in various spheres of public life dictates the creation of efficient and productive systems for entering information into computer systems. In such systems it is important to build an effective recognition module. At the moment, the most effective method for solving this problem is the use of artificial multilayer neural and convolutional networks. The purpose of the paper. This paper is devoted to a comparative analysis of the recognition results of handwritten characters of the Azerbaijani alphabet using neural and convolutional neural networks. Results. The analysis of the dependence of the recognition results on the following parameters is carried out: the architecture of neural networks, the size of the training base, the choice of the subsampling algorithm, the use of the feature extraction algorithm. To increase the training sample, the image augmentation technique was used. Based on the real base of 14000 characters, the bases of 28000, 42000 and 72000 characters were formed. The description of the feature extraction algorithm is given. Conclusions. Analysis of recognition results on the test sample showed: as expected, convolutional neural networks showed higher results than multilayer neural networks; the classical convolutional network LeNet-5 showed the highest results among all types of neural networks. However, the multi-layer 3-layer network, which was input by the feature extraction results; showed rather high results comparable with convolutional networks; there is no definite advantage in the choice of the method in the subsampling layer. The choice of the subsampling method (max-pooling or average-pooling) for a particular model can be selected experimentally; increasing the training database for this task did not give a tangible improvement in recognition results for convolutional networks and networks with preliminary feature extraction. However, for networks learning without feature extraction, an increase in the size of the database led to a noticeable improvement in performance. Keywords: neural networks, feature extraction, OCR.

  • Research Article
  • Cite Count Icon 35
  • 10.1109/access.2019.2943927
Research on Medical Data Feature Extraction and Intelligent Recognition Technology Based on Convolutional Neural Network
  • Jan 1, 2019
  • IEEE Access
  • Weidong Liu + 6 more

In order to mine information from medical health data and develop intelligent application-related issues, the multi-modal medical health data feature representation learning related content was studied, and several feature learning models were proposed for disease risk assessment. In the aspect of medical text feature learning, a medical text feature learning model based on convolutional neural network is proposed. The convolutional neural network text analysis technology is applied to the disease risk assessment application. The medical data feature representation adopts the deep learning method. The learning and extraction of different disease characteristics use the same method to realize the versatility of the model. A simple preprocessing of the experimental data samples, including its power frequency denoising and lead convolution regularization, constructs a convolutional neural network for medical data feature advancement and intelligent recognition. On the basis of it, several sets of experiments were carried out to discuss the influence of the convolution kernel and the choice of learning rate on the experimental results. In addition, comparative experiments with support vector machine, BP neural network and RBF neural network are carried out. The results show that the convolutional neural network used in this paper shows obvious advantages in recognition rate and training speed compared with other methods. In the aspect of time series data feature learning, a multi-channel convolutional self-encoding neural network is proposed. Analyze the connection between fatigue and emotional abnormalities and define the concept of emotional fatigue. The proposed multi-channel convolutional neural network is used to learn the data features, and the convolutional self-encoding neural network is used to learn the facial image data features. These two characteristics and the collected physiological data are combined to perform emotional fatigue detection. An emotional fatigue detection demonstration platform for multi-modal data feature fusion is established to realize data acquisition, emotional fatigue detection and emotional feedback. The experimental results verify the validity, versatility and stability of the model.

  • Conference Article
  • Cite Count Icon 25
  • 10.1109/cac.2017.8243120
Identification of autonomous landing sign for unmanned aerial vehicle based on faster regions with convolutional neural network
  • Oct 1, 2017
  • Junjie Chen + 4 more

In order to realize autonomous landing of the unmanned aerial vehicle (UAV) in power patrolling, a visual method vision based on Faster Regions with Convolutional Neural Network (Faster R-CNN) for UAVs is studied. In this paper, we design the landing sign of the combination of concentric circles and pentagon, and propose the Faster R-CNN recognition algorithm which can be used to identify the target sign. Faster R-CNN successfully identifying the landing mark is the most important step for the UAV autonomous landing. Then, the estimation algorithm of position and direction based on vision is proposed. Position and direction for the UAV landing can be obtained based on least squares ellipse fitting and Shi-Tomasi corner detection method after the landing sign is effectively identified by Faster R-CNN. The experimental results show that it can achieve recognition speed of nearly 81 millisecond each frame and 97.8% accuracy by using Faster R-CNN for detection and identification. The proposed method has better identification accuracy compared with three target identification methods, the Support Vector Machine (SVM) classification, the Back Propagation (BP) neural network and You Only Look Once (YOLO) based on deep learning. The position and direction estimation error of the vision algorithm is within the allowable range, and it can meet the UAV real-time landing requirements.

  • Conference Article
  • Cite Count Icon 50
  • 10.1109/icaccs57279.2023.10112860
Comparative Investigations on Tomato Leaf Disease Detection and Classification Using CNN, R-CNN, Fast R-CNN and Faster R-CNN
  • Mar 17, 2023
  • G Priyadharshini + 1 more

This composition bargains with the Tomato leaf infection discovery and classification utilizing different strategies like Convolutional Neural Network (CNN), Regions with CNN (R-CNN), Fast R-CNN and Faster R-CNN. The main issue in the agricultural sector is leaf diseases, which have an impact on crop yield and financial gain. Early detection of leaf diseases in plants is essential to prevent losses to the agricultural sector. There are several tomato leaf diseases that affect the crop's leaves, including the Mosaic virus, Early Blight, Healthy, Septoria leaf, and Bacterial spot. Using deep learning algorithms and image processing methods, we can identify the diseases in tomato leaves using developing deep learning approaches. The implementation procedures in our proposed work involves data collection, pre-processing, training, feature extraction, testing, and classification utilising the Visual Geometry Group (VGG 16) to identify damaged or healthy leaves. VGG 16 is incorporated to categorise the leaves as healthy or diseased based on the data and Regression's boundary box method is adopted. Therefore, using Faster RCNN, a model is created to identify and categorise diseases from every image of a tomato leaf that is used as an input, providing a forecast with a considerably greater degree of accuracy. We obtain an accuracy of approximately 98% after fitting the collected features into the neural network over 20 iterations.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/oceanskobe.2018.8559112
Segmentation of Underwater Object in Videos
  • May 1, 2018
  • Yuemei Zhu + 6 more

Video segmentation is a necessary step for object tracking. Existing methods that are used to extract object from the background based on an intensive sequence of searching all across the frames, thus this process performs lots of searching works result with low efficiency, whereas other methods obtain segmentation by clustering pixels which resulting in over-segmentation. Inspired by breakthroughs in semantic segmentation, in this paper, we propose to combine appearance and dynamic cues, which is a common conception and plays a key role in successfully segmenting objects in videos. To implement this idea, we combine Deep Convolutional Neural Network (DCNN) and optical flow information of two continuous frames. To overcome the difficulty of segmentation of underwater object in videos induced by the presence of different types of suspension particle from like the water droplets and dust particles to the poor lighting and over lighting conditions, In this work, Contrast-Limited Adaptive Histogram Equalization (CLAHE) and a simple color resign method are used to enhance details and reduce greenish and bluish effects. Some DCNN variants are applied to semantic segmentation and achieve great efficiency. Specifically, because DCNN can obtain different spatial scale information, as a DCNN variant, DeepLab gets a good performance in semantic segmentation. By using atrous convolution, DeepLab network's filters can observe greater receptive field without reducing the feature map dimension, therefore this structure keeps global and position information. Consequently, we compromises above mentioned methods, The optical flow estimation is carried out on the image processed by the CLAHE method, and the accurate segmentation results are obtained by using the DeepLab network. Experiments show good performance of our method.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/aims52415.2021.9466014
Multi-Pole Road Sign Detection Based on Faster Region-based Convolutional Neural Network (Faster R-CNN)
  • Apr 28, 2021
  • AIMS 2021 - International Conference on Artificial Intelligence and Mechatronics Systems
  • Achmad Zulfajri Syaharuddin + 2 more

Building an approach system that is able to serve various types of traffic signs is a challenge. The important stages in handling an object are finding objects, dividing them into several categories, and marking objects with bounding boxes. However, in reality, monitoring traffic signs objects is quite difficult because it is based on various factors such as; other closed objects, driving times, or traffic sign conditions. This study aims to measure the level of precision in monitoring traffic signs (detection speed of 4-6 frames per second) from video recording (single camera) using the Faster Region based Convolutional Neural Network (Faster R-CNN) algorithm. The traffic sign detection system uses the Faster R-CNN algorithm with Inception v2 model which is implemented in the TensorFlow API framework. The Faster R-CNN consists of 2 different modules. The first module is a deep convolutional neural network which functions to build the area to be detected, which is called the Regional Proposal Network (RPN), and the second module is the Fast R-CNN detector which functions to use the previously proposed area. This system is one unit, a detection network based on the results of the manufacture and testing of a traffic sign detection system based on the Faster R-CNN method, so it can be shown that there is no difference in the results of detection of traffic signs in day and night conditions. Where the precision testing for detection of traffic signs during the day and at night is 100%.

  • Research Article
  • Cite Count Icon 2
  • 10.54254/2755-2721/47/20241325
A comparison of deep learning-based object detection for unmanned aerial vehicle
  • Mar 15, 2024
  • Applied and Computational Engineering
  • Tinghuan Li

Unmanned aerial vehicle (UAV) plays a critical role in the field of object detection, and machine learning techniques have significantly advanced the field in recent years. This paper provides a comprehensive overview of machine learning developments for UAV object detection. The process involves multiple steps. Firstly, deep learning and Convolutional Neural Network (CNN) are widely utilized to extract precise features and train object detection models for accurate classification and efficient recognition of target objects in images and videos. Secondly, classical object detection algorithms such as Faster R-CNN, You Only Look Once (YOLO), and Single Shot MultiBox Detector (SSD), have been enhanced to improve accuracy and real-time performance. In this review, we primarily focus on comparing the principles of CNN and YOLOv5 themselves, as well as their applications in object detection and image recognition. Ultimately, it becomes apparent that CNN is better suited for processing image data, automatically extracting features, and achieving more accurate classification and detection. On the other hand, YOLOv5 directly performs detection on large images, significantly reducing computation time compared to CNN.

  • Research Article
  • Cite Count Icon 3
  • 10.1155/2021/8665891
Object Detection and Movement Tracking Using Tubelets and Faster RCNN Algorithm with Anchor Generation
  • Jan 1, 2021
  • Wireless Communications and Mobile Computing
  • Prabu Mohandas + 4 more

Object detection in images and videos has become an important task in computer vision. It has been a challenging task due to misclassification and localization errors. The proposed approach explored the feasibility of automated detection and tracking of elephant intrusion along forest border areas. Due to an alarming increase in crop damages resulted from movements of elephant herds, combined with high risk of elephant extinction due to human activities, this paper looked into an efficient solution through elephant’s tracking. The convolutional neural network with transfer learning is used as the model for object classification and feature extraction. A new tracking system using automated tubelet generation and anchor generation methods in combination with faster RCNN was developed and tested on 5,482 video sequences. Real‐time video taken for analysis consisted of heavily occluded objects such as trees and animals. Tubelet generated from each video sequence with intersection over union (IoU) thresholds have been effective in tracking the elephant object movement in the forest areas. The proposed work has been compared with other state‐of‐the‐art techniques, namely, faster RCNN, YOLO v3, and HyperNet. Experimental results on the real‐time dataset show that the proposed work achieves an improved performance of 73.9% in detecting and tracking of objects, which outperformed the existing approaches.

  • Conference Article
  • Cite Count Icon 50
  • 10.1117/12.2270326
Pedestrian detection in video surveillance using fully convolutional YOLO neural network
  • Jun 26, 2017
  • Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
  • V V Molchanov + 4 more

More than 80% of video surveillance systems are used for monitoring people. Old human detection algorithms, based on background and foreground modelling, could not even deal with a group of people, to say nothing of a crowd. Recent robust and highly effective pedestrian detection algorithms are a new milestone of video surveillance systems. Based on modern approaches in deep learning, these algorithms produce very discriminative features that can be used for getting robust inference in real visual scenes. They deal with such tasks as distinguishing different persons in a group, overcome problem with sufficient enclosures of human bodies by the foreground, detect various poses of people. In our work we use a new approach which enables to combine detection and classification tasks into one challenge using convolution neural networks. As a start point we choose YOLO CNN, whose authors propose a very efficient way of combining mentioned above tasks by learning a single neural network. This approach showed competitive results with state-of-the-art models such as FAST R-CNN, significantly overcoming them in speed, which allows us to apply it in real time video surveillance and other video monitoring systems. Despite all advantages it suffers from some known drawbacks, related to the fully-connected layers that obstruct applying the CNN to images with different resolution. Also it limits the ability to distinguish small close human figures in groups which is crucial for our tasks since we work with rather low quality images which often include dense small groups of people. In this work we gradually change network architecture to overcome mentioned above problems, train it on a complex pedestrian dataset and finally get the CNN detecting small pedestrians in real scenes.

  • Research Article
  • Cite Count Icon 196
  • 10.1016/j.neucom.2019.01.111
Brain tumor segmentation with deep convolutional symmetric neural network
  • Apr 24, 2019
  • Neurocomputing
  • Hao Chen + 4 more

Brain tumor segmentation with deep convolutional symmetric neural network

  • Research Article
  • Cite Count Icon 53
  • 10.1002/mp.12399
Detection and diagnosis of colitis on computed tomography using deep convolutional neural networks.
  • Jul 18, 2017
  • Medical Physics
  • Jiamin Liu + 8 more

Colitis refers to inflammation of the inner lining of the colon that is frequently associated with infection and allergic reactions. In this paper, we propose deep convolutional neural networks methods for lesion-level colitis detection and a support vector machine (SVM) classifier for patient-level colitis diagnosis on routine abdominal CT scans. The recently developed Faster Region-based Convolutional Neural Network (Faster RCNN) is utilized for lesion-level colitis detection. For each 2D slice, rectangular region proposals are generated by region proposal networks (RPN). Then, each region proposal is jointly classified and refined by a softmax classifier and bounding-box regressor. Two convolutional neural networks, eight layers of ZF net and 16 layers of VGG net are compared for colitis detection. Finally, for each patient, the detections on all 2D slices are collected and a SVM classifier is applied to develop a patient-level diagnosis. We trained and evaluated our method with 80 colitis patients and 80 normal cases using 4×4-fold cross validation. For lesion-level colitis detection, with ZF net, the mean of average precisions (mAP) were 48.7% and 50.9% for RCNN and Faster RCNN, respectively. The detection system achieved sensitivities of 51.4% and 54.0% at two false positives per patient for RCNN and Faster RCNN, respectively. With VGG net, Faster RCNN increased the mAP to 56.9% and increased the sensitivity to 58.4% at two false positive per patient. For patient-level colitis diagnosis, with ZF net, the average areas under the ROC curve (AUC) were 0.978±0.009 and 0.984±0.008 for RCNN and Faster RCNN method, respectively. The difference was not statistically significant with P=0.18. At the optimal operating point, the RCNN method correctly identified 90.4% (72.3/80) of the colitis patients and 94.0% (75.2/80) of normal cases. The sensitivity improved to 91.6% (73.3/80) and the specificity improved to 95.0% (76.0/80) for the Faster RCNN method. With VGG net, Faster RCNN increased the AUC to 0.986±0.007 and increased the diagnosis sensitivity to 93.7% (75.0/80) and specificity was unchanged at 95.0% (76.0/80). Colitis detection and diagnosis by deep convolutional neural networks is accurate and promising for future clinical application.

  • Research Article
  • Cite Count Icon 2
  • 10.4114/intartif.vol24iss68pp21-32
A New Method of Different Neural Network Depth and Feature Map Size on Remote Sensing Small Target Detection
  • Sep 30, 2021
  • Inteligencia Artificial
  • Yaming Cao + 2 more

Convolutional neural networks (CNNs) have shown strong learning capabilities in computer vision tasks such as classification and detection. Especially with the introduction of excellent detection models such as YOLO (V1, V2 and V3) and Faster R-CNN, CNNs have greatly improved detection efficiency and accuracy. However, due to the special angle of view, small size, few features, and complicated background, CNNs that performs well in the ground perspective dataset, fails to reach a good detection accuracy in the remote sensing image dataset. To this end, based on the YOLO V3 model, we used feature maps of different depths as detection outputs to explore the reasons for the poor detection rate of small targets in remote sensing images by deep neural networks. We also analyzed the effect of neural network depth on small target detection, and found that the excessive deep semantic information of neural network has little effect on small target detection. Finally, the verification on the VEDAI dataset shows, that the fusion of shallow feature maps with precise location information and deep feature maps with rich semantics in the CNNs can effectively improve the accuracy of small target detection in remote sensing images.

Save Icon
Up Arrow
Open/Close