Rotated R-CNN: A Two-Stage Object Detection Method Adapted To Oriented Bounding Boxes
Currently, oriented object detection, as an emerging subfield within object detection, has garnered significant attention. Besides encompassing directional information, datasets of oriented objects exhibit notable characteristics, including significant variations in object scales and a wide range of aspect ratios for ground-truth bounding boxes. Nevertheless, the current state-of-the-art two-stage rotating object detection models have not sufficiently addressed these characteristics, leading to inherent limitations in accuracy. In response to these challenges, we introduce the Rotated RCNN. Our model is the first to introduce trainable anchors in the field of oriented object detection to achieve anchor distributions similar to the ground truth boxes in the oriented object dataset. Furthermore, considering the distinctive traits of oriented ground truth boxes, we have devised a novel strategy for assigning labels to more effectively choose positive and negative samples specifically designed for oriented objects. In the regression phase of the RPN, we introduce shape constraints to alleviate accuracy losses stemming from mismatches between the encoding method and oriented objects. We comprehensively evaluate our model on the DOTAv1.0 and HRSC2016 datasets, demonstrating the effectiveness of our meticulously designed model.
- Research Article
1
- 10.47000/tjmcs.1002767
- Jun 30, 2022
- Turkish Journal of Mathematics and Computer Science
Moving vehicle detection is one of important issues in surveillance and traffic monitoring applications for aerial images. In this study, a vehicle detection method is proposed by combining motion and object detection. A method based on background modeling and subtraction is applied for motion detection, while Faster-RCNN architecture is used for object detection. Motion detection result is enhanced with the proposed superpixel based refinement method. Experimental study shows that performance of motion detection increases about 8\% for $F_1$ metric with the proposed post processing method. Object detection, motion detection and superpixel segmentation methods interact with each other in parallel processes with the proposed software architecture, which significantly increases the working speed of the method. In last step of the proposed method, each vehicle is tracked with the kalman filter. The performance of proposed method is evaluated on the VIVID dataset. The performance evaluation shows that proposed method increases $F_1$ and recall values significantly compared to the motion and object detection methods alone. It also outperforms SCBU and MCD methods which are widely used for performance comparison in motion detection studies in the literature
- Conference Article
2
- 10.1109/iciss49785.2020.9315973
- Dec 3, 2020
Traditional object Detection methods are the most rudimentary and have many testing issues in computer vision, since they endeavor to find object models from hefty number of predefined classifications in naturalistic images. From object indicators and scene classifiers, the image presentation constructs complex groups which join various low-level picture highlights with significant level setting. This paper analyzes various details of general object detection methods like object proposal generation, detection frameworks, object extraction, context modeling methods and Region of classification. Also using a brief architecture of object detection with its algorithm techniques, and deep learning methods based on object detection frameworks were studied. This paper proposed A Systematic Hybrid Smart Region Based Detection Method (SRBD) for Object Detection and attempts to overcome the drawbacks of the existing systems like Faster R-CNN, SSD and Yolo.
- Conference Article
68
- 10.1109/icdar.2017.46
- Nov 1, 2017
This electronic document is a "live" template. The various components of your paper [title, text, heads, etc.] are Abstract-Object detection in natural scenes has been widely researched in the past decade, and many deep learning based methods have achieved good performance on this task. This paper focuses on how to transfer and refine those object detection approaches from natural scene images to documents images, and proposes a deep learning-based page object (e.g., tables, formulae, figures) detection method. On the basis of traditional Convolutional Neural Network (CNN) based object detection methods, we redesign the region proposal method, the training strategy, the network structure and replace the Non-Maximum Suppression (NMS) with a dynamic programming algorithm. The experimental results show that it is essential to adjust some modules of the natural scene object detection approaches in order to better process the document images. The proposed method also achieved better performance compared with existing page object detection methods.
- Conference Article
- 10.1109/icassp43922.2022.9746934
- May 23, 2022
Domain shift causes performance drop in cross-domain object detection. To alleviate the domain shift, a prevailing approach is global feature alignment with adversarial learning. However, such simple feature alignment has defects of unawareness of fore-ground/background regions and well-aligned/poorly-aligned regions. To remedy the defects, in this paper, we propose a novel divergence-guided feature alignment method for cross-domain object detection. Specifically, we generate source-like images of the target domain and seek cues of foreground regions and poorly-aligned regions from prediction divergence of the source-like and original images. The feature alignment is guided by the divergence maps and consequently results in adaptation performance superior to alignment unaware of the cues. Different from most previous studies focusing on two-stage object detection, this paper is devoted to adapting one-stage object detectors which have simpler and faster inference. We validated the effectiveness of our method by conducting experiments in cross-weather, cross-camera, and synthetic-to-real adaptation scenarios.
- Research Article
2
- 10.14569/ijacsa.2023.0140437
- Jan 1, 2023
- International Journal of Advanced Computer Science and Applications
Deep learning object detection methods are usually based on anchor-free or anchor-based scheme for extracting object proposals and one-stage or two-stage structure for producing final predictions. As each scheme or structure has its own strength and weakness, combining their strength in a unified framework is an interesting research topic. However, this topic has not attracted much attention in recent years. This paper presents a two-stage object detection method that utilizes an anchor-free scheme for generating object proposals in the initial stage. For proposal generation, this paper employs an efficient anchor-free network for predicting object corners and assigns object proposals based on detected corners. For object prediction, an efficient detection network is designed to enhance both detection accuracy and speed. The detection network includes a lightweight binary classification subnetwork for removing most false positive object candidates and a light-head detection subnetwork for generating final predictions. Experimental results on the MS-COCO dataset demonstrate that the proposed method outperforms both anchor-free and two-stage object detection baselines in terms of detection performance.
- Book Chapter
1
- 10.2174/9789815223491124010015
- Oct 13, 2024
The object detection framework recognises real-world objects within the frame of a moving photograph or computer-generated image. The object has a location to flow to through other objects, such as people or automobiles. Item detection is widely used in sectors where it is necessary for an organization's security and growth. The vast range of applications for protest detection include image recovery, security strategy, reason for inspection, machine framework assessment, and computerised vehicle structure. In contrast to conventional object localization techniques, machine learning-based object identification makes use of the machine's greater capacity to learn and represent knowledge [1]. A difficult problem in the analysis of designs and computer frameworks is object detection. Later on, the relationship between object detection, video analysis and image processing was developed. The complicated structure that is now being constructed includes both fundamental and sophisticated features, and the evaluation is carried out depending on the classifiers used. A complex system that can accurately assess and distinguish between numerous aspects is produced as a result of this combination. Several deep-level characteristics have been developed as a result of machine learning advancements to address the problems in the old design [2]. We conducted research on one-stage and two-stage object detectors, which are further categorised into deep learning methodologies. To enhance object detection, CNN networks employ these algorithms. An evaluation of the machine learning method for object detection is presented in this paper [3]. The protest site's applications have been distilled. The various methods of object localization employ template-based, region-based, and portion-based methods.
- Research Article
13
- 10.1109/tase.2021.3116040
- Oct 1, 2022
- IEEE Transactions on Automation Science and Engineering
Thermomechanical processes (TMPs) such as resistance spot welding (RSW) and hot stamping are widely used in automotive manufacturing. Recent advancement in sensing technology has led to an increasing adoption of thermographic cameras to capture the infrared (IR) radiation of a metal part (or component of a part) during its thermomechanical processing or immediately after the process when the part is still hot. Detecting the object(s) of interest from raw IR images is an essential step in analyzing these data. Deep learning (DL) has been a recent success for object detection (OD), but the application of DL-based OD for industrial IR images in manufacturing is largely lagging behind. The major contribution of this work, which is also the distinction from previous OD studies, is the capability of building the OD model with unlabeled IR images, i.e., imaging data without accurate information indicating the object position. The architecture of Unsupervised IR Image Net (UIR-Net) is designed to accommodate the unique characteristics of IR images from TMPs in manufacturing. This study presents a novel method for OD in unlabeled IR images from TMPs. The proposed method, called UIR-Net, consists of two components: label generation and DL model construction. Two case studies from automotive manufacturing, RSW and hot stamping, are reported to demonstrate the feasibility and effectiveness of the proposed method. Note to Practitioners—This article was motivated by the problem of detecting objects such as weld nugget or metal piece in infrared (IR) imaging of thermomechanical processes (TMPs) in automotive manufacturing. The method is applicable to in situ IR images or videos that contain one or more objects to be detected. It only requires that the data are in image form and come from TMPs. Currently, there is no existing deep learning (DL)-based method for generic object detection (OD) in unlabeled IR images from TMPs. The proposed method takes advantages of the recent advancement in DL. This article suggests a systematic approach to build a DL-based OD model, named Unsupervised IR Image Net (UIR-Net), to extract objects from raw IR images collected for TMPs. A step-by-step procedure is given in this article to guide users through label generation, data quality evaluation, and model training to establish the proposed UIR-Net model. Results from resistance spot welding and hot stamping suggest that this approach is feasible and effective. It is one of the few generic OD works designed for manufacturing applications. Simple implementation, feasibility, and effectiveness make this method a suitable candidate for online data analytics and process monitoring in a wide range of manufacturing applications.
- Research Article
7
- 10.1155/2022/3843155
- Jul 14, 2022
- Computational Intelligence and Neuroscience
Compared with the traditional object detection algorithm, the object detection algorithm based on deep learning has stronger robustness to complex scenarios, which is the hot direction of current research. According to the process characteristics of the object detection algorithm based on deep learning, it is divided into two-stage object detection algorithm and single-stage object detection algorithm, focusing on the problems solved by some classical algorithms and their advantages and disadvantages. In view of the problem of object detection, especially small object detection, the commonly used data sets and performance evaluation indicators are summarized; the characteristics, advantages, and detection difficulties of various common data sets are compared; the challenges faced by commonly used object detection methods and small object detection are systematically summarized; the latest work of small object detection methods based on deep learning is sorted out; and the small object detection methods based on multiscale and small object detection methods based on super-resolution are introduced. At the same time, the lightweight strategy for target detection methods and the performance of some lightweight models are introduced; the characteristics, advantages, and limitations of various methods are summarized; and the future development direction of small object detection methods based on deep learning is prospected.
- Conference Article
7
- 10.1109/siprocess.2017.8124494
- Aug 1, 2017
Unattended object detection is a crucial task in visual surveillance systems. However, it is challenging in handling false alarms and miss detection rate. In this paper, a two-stage method for the unattended object detection is proposed where the first stage tries to detect all possible unattended objects and prevent miss detections by considering attributes of objects such as staticness, foregroundness, and abandonment. This stage is called the unattended object proposal stage. In the second stage, our method reduces false alarms with candidates obtaining from the first stage by using a deep learning similarity matching between candidates and the background model. With the capability of reducing false alarms and miss detections, our method can be applied in large-scale deployment systems for unattended object detection.
- Research Article
105
- 10.1016/j.compbiomed.2022.106470
- Dec 28, 2022
- Computers in Biology and Medicine
An improved faster R-CNN algorithm for assisted detection of lung nodules
- Conference Article
- 10.65286/icic.v20i2.93092
- Jan 1, 2024
Traditional object detection methods typically require large-scale annotated training data. However, in some areas, acquiring a large amount of annotated data can be extremely challenging. To address the issue of Few-Shot Object Detection FSOD , researchers have introduced the concept of meta-learning, Currently, meta-learning is widely applied in two-stage object detection. We have identified several key issues affecting the accuracy of FSOD, including limited data, insufficient feature extraction capabilities, and the aggregation method between different features. To more finely extract features and better aggregate features, we separate the support branch and query branch of Meta-RCNN, forming two parallel branches. We create one mixed feature processing model for few shot object detection, we put the Feature Pyramid Network FPN only into the backbone network of the query branch, creating a strong baseline to enhance the extraction capabilities for images of different dimensions. Additionally, for the first time in FSOD, we use a Variational Autoencoder VAE model to extract features. Which achieves data augmentation and improves the generalisation ability of the network by adding the VAE to the support branch to obtain more useful information in the support set.In addition to this, we design a module $R$ to aggregate the output support image features with the query image features on the query branch, the aggregated results are fed into the detection head of the object detection process.Experimental results demonstrate that the proposed method exhibits good performance. Following the experimental settings for FSOD, we conducted extensive experiments on the PASCAL VOC dataset, showing that our method is superior to other methods currently available and achieves very satisfactory results.
- Research Article
4
- 10.1007/s00521-020-05400-w
- Oct 10, 2020
- Neural Computing and Applications
Scale variation is one of the major challenges in object detection task. Modern region-based object detection architectures often adopt Feature Pyramid Network (FPN) as feature extraction neck to achieve multi-scale feature representation in solving scale variation problem. However, due to the rough feature selection strategy in Region of Interest (RoI) feature extraction step, these methods might not perform well on object detection under strong scale variation. In this work, we are motivated by the limitations of current FPN-based two-stage object detectors and then present a novel module, namely scale-aware feature selective (SAFS) module, that flexibly and adaptively selects feature levels in two-stage object detectors. Specifically, we firstly build the RoI Pyramid in standard FPN structure to extract RoI features from various scale levels. Next, in order to achieve scale-aware mechanism for solving scale variation issue, we develop a novel weighting gate function containing one set of trainable parameters to automatically learn the fusion weight for each RoI feature level, which relieves the limitation of hard feature selection strategy guided by online instance size. Outputs from the RoI features with the learned weights are fused for classification and bounding box regression. Furthermore, we design a multi-level SAFS architecture to obtain different types of RoI feature combinations that ensures our method is more robust to various instance scales. Experimental results show that our SAFS module is very compatible with most of two-stage object detectors and could achieve state-of-the-art results with Average Precision of 48.3 on COCO test-dev and other popular object detection benchmarks. Our code will be made publicly available.
- Research Article
21
- 10.1109/tits.2023.3253509
- Jun 1, 2023
- IEEE Transactions on Intelligent Transportation Systems
Object detection and classification are key processes in advanced driver-assistance systems. The existing object detection and classification methods are effective in normal daylight conditions. However, the performance of these methods deteriorates in adverse driving conditions, such as those involving low light, illumination changes, and nighttime conditions. To overcome these limitations, several feature-based algorithms have been developed that introduce local features, such as local binary pattern, local tetra pattern, and local density encoding, for adverse driving conditions. However, these local patterns cannot effectively address the noise in real driving conditions because the relationship between the neighboring pixels cannot be comprehensively encoded. To solve these problems, this study developed a robust feature-based method by introducing a triangular-pattern-based sigmoid function to effectively encode and establish the robust feature of neighboring pixels in the local region. The performance of the proposed pattern is evaluated by integrating it into state-of-the-art object detection algorithms. The proposed method significantly increases the vehicle detection ratio of YOLOv5s by 11.7% for an intersection over a union of 0.5 in difficult driving conditions for the CCD dataset. Moreover, the detection ratios of the proposed method are comparable to those of other state-of-the-art object detection methods such as Retina, Faster RCNN, and Deformable DETR over various datasets such as KITTI, COCO, HCI, and CCD. Additionally, the proposed algorithm is implemented on a Raspberry Pi-based autonomous car system to evaluate its performance during real driving conditions. Our proposed method supports robust input feature extraction and can thus be used to enhance the performance of the existing obstacle detection and classification systems.
- Research Article
31
- 10.1109/tip.2021.3126423
- Jan 1, 2021
- IEEE Transactions on Image Processing
Decoupling the sibling head has recently shown great potential in relieving the inherent task-misalignment problem in two-stage object detectors. However, existing works design similar structures for the classification and regression, ignoring task-specific characteristics and feature demands. Besides, the shared knowledge that may benefit the two branches is neglected, leading to potential excessive decoupling and semantic inconsistency. To address these two issues, we propose Heterogeneous task decoupling (HTD) framework for object detection, which utilizes a Progressive Graph (PGraph) module and a Border-aware Adaptation (BA) module for task-decoupling. Specifically, we first devise a Semantic Feature Aggregation (SFA) module to aggregate global semantics with image-level supervision, serving as the shared knowledge for the task-decoupled framework. Then, the PGraph module performs progressive graph reasoning, including local spatial aggregation and global semantic interaction, to enhance semantic representations of region proposals for classification. The proposed BA module integrates multi-level features adaptively, focusing on the low-level border activation to obtain representations with spatial and border perception for regression. Finally, we utilize the aggregated knowledge from SFA to keep the instance-level semantic consistency (ISC) of decoupled frameworks. Extensive experiments demonstrate that HTD outperforms existing detection works by a large margin, and achieves single-model 50.4%AP and 33.2% APs on COCO test-dev set using ResNet-101-DCN backbone, which is the best entry among state-of-the-arts under the same configuration. Our code is available at https://github.com/CityU-AIM-Group/HTD.
- Conference Article
28
- 10.1109/avss.2019.8909834
- Sep 1, 2019
To date, deep learning has been widely introduced in many fields, including object detection, medical imaging, and automation. One important application that uses deep learning based object detection is detecting defects by simply evaluating the image of an object. Such systems must be accurate, robust and efficient. Single-stage and two-stage object detection are two main approaches used in defect detection systems. A revised version of the popular object detection method called single shot multi-box detector (SSD) and the residual network (ResNet) offer a two-stage method to automatically detect defects with higher precision but has shown room for improvement with regard to speed performance. Therefore, in this paper, we propose a fully automatic pipeline for detecting defects, especially on steel surfaces. A novel transformation of the two-stage defect detection process into a more efficient single-stage detection process was introduced by utilizing a state-of-the-art method called RetinaNet. In addition, we leverage a feature pyramid network (FPN) and focal loss optimization to solve the small object detection problem and to deal with imbalanced background-foreground samples issue, respectively. Experimental results show that the proposed single-stage pipeline can achieve high accuracy and faster speed in steel surface defect detection.