Microsoft COCO Dataset Research Articles

The use of decision support systems based on computer vision and artificial intelligence significantly improves the working conditions for the operators of technological machines in the timber sector, whose work implies high intensity and psycho-emotional overload. By means of computer vision and artificial intelligence the operator can quickly and easily obtain the data on the state of the cutting area and adopt the optimal solution for holding the working operation. This facilitates his work and reduces the time spent searching and analyzing the data on the cutting area. Meanwhile, one of the key elements of such a system is a subsystem for automatic segmentation of objects in the photograph. We have explored the possibility of segmenting overlapping objects in the photographs of forest areas using a convolutional neural network based on the Mask R-CNN architecture. Unlike in most works on similar topics, the objects of this study are color photographs taken by an RGB camera rather than a lidar. This creates the prospect for reducing the cost of hardware and software systems used to support decision-making by the operators of logging machines. The images of the stems and crowns of coniferous and deciduous trees overlapping each other are the segmented objects under consideration. Using the GIMP graphic editor, we have manually marked the color photographs depicting a total of 134 trees of 4 different species: spruce, aspen, birch and pine. Utilizing the developed database, we have carried out an experiment to further train the Mask R-CNN convolutional neural network for segmentation of overlapping parts of the trees in the digital photographs of forest areas. The neural network has been pre-trained using the Microsoft COCO dataset containing more than 200,000 images of 80 different classes of objects such as people, cars, animals and various items. While training the neural network, the images supplied to its input were subjected to a series of various linear and nonlinear geometric transformations, which made it possible to increase the volume of training data by 11 times. As a result, the accuracy of segmentation of the images of the stems and crowns of coniferous and deciduous trees overlapping each other has reached 79 %, which allows the use of neural networks based on a similar architecture in decision support systems for logging machine operators.

Read full abstract

Object detection is an important process in surveillance system to locate objects and it is considered as major application in computer vision. The Convolution Neural Network (CNN) based models have been developed by many researchers for object detection to achieve higher performance. However, existing models have some limitations such as overfitting problem and lower efficiency in small object detection. Object detection in remote sensing hasthe limitations of low efficiency in detecting small object and the existing methods have poor localization. Cascade Object Detection methods have been applied to increase the learning process of the detection model. In this research, the Additive Activation Function (AAF) is applied in a Faster Region based CNN (RCNN) for object detection. The proposed AAF-Faster RCNN method has the advantage of better convergence and clear bounding variance. The Fourier Series and Linear Combination of activation function are used to update the loss function. The Microsoft (MS) COCO datasets and Pascal VOC 2007/2012 are used to evaluate the performance of the AAF-Faster RCNN model. The proposed AAF-Faster RCNN is also analyzed for small object detection in the benchmark dataset. The analysis shows that the proposed AAF-Faster RCNN model has higher efficiency than state-of-art Pay Attention to Them (PAT) model in object detection. To evaluate the performance of AAF-Faster RCNN method of object detection in remote sensing, the NWPU VHR-10 remote sensing data set is used to test the proposed method. The AAF-Faster RCNN model has mean Average Precision (mAP) of 83.1% and existing PAT-SSD512 method has the 81.7%mAP in Pascal VOC 2007 dataset.

Read full abstract

Microsoft COCO Dataset Research Articles

Articles published on Microsoft COCO Dataset

Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture

Сегментация перекрывающихся изображений деревьев на цифровых снимках лесных массивов

Adequate alignment and interaction for cross-modal retrieval

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models.

Semantic Representations With Attention Networks for Boosting Image Captioning

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Radio-Assisted Human Detection

A non-definitive auto-transfer mechanism for arbitrary style transfers

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Multi-label Image Classification Model via Label Correlation Matrix

Application of YOLO (You Only Look Once) V.4 with Preprocessing Image and Network Experiment

Integrating Scene Semantic Knowledge into Image Captioning

A Deep Learning and Transfer Learning Approach for Vehicle Damage Detection

Coconut trees detection and segmentation in aerial imagery using mask region‐based convolution neural network

Mixed-Clipping Quantization for Convolutional Neural Networks

ABOships—An Inshore and Offshore Maritime Vessel Detection Dataset with Precise Annotations

Image Description Generation based on Deep Learning

Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function

Extracting Effective Image Attributes with Refined Universal Detection.

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Microsoft COCO Dataset Research Articles

Articles published on Microsoft COCO Dataset

Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture

Сегментация перекрывающихся изображений деревьев на цифровых снимках лесных массивов

Adequate alignment and interaction for cross-modal retrieval

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models.

Semantic Representations With Attention Networks for Boosting Image Captioning

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Radio-Assisted Human Detection

A non-definitive auto-transfer mechanism for arbitrary style transfers

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Multi-label Image Classification Model via Label Correlation Matrix

Application of YOLO (You Only Look Once) V.4 with Preprocessing Image and Network Experiment

Integrating Scene Semantic Knowledge into Image Captioning

A Deep Learning and Transfer Learning Approach for Vehicle Damage Detection

Coconut trees detection and segmentation in aerial imagery using mask region‐based convolution neural network

Mixed-Clipping Quantization for Convolutional Neural Networks

ABOships—An Inshore and Offshore Maritime Vessel Detection Dataset with Precise Annotations

Image Description Generation based on Deep Learning

Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function

Extracting Effective Image Attributes with Refined Universal Detection.

Cross-Modal Attention With Semantic Consistence for Image-Text Matching.