RMRN-DETR: regression-optimized remote sensing image detection network based on multi-dimensional real-time detection and domain adaptation
ABSTRACT With the advancement of real-time object detection technology, maintaining high detection accuracy for small objects across multiple scales remains challenging. Conventional convolutional neural networks (CNNs) struggle to effectively capture multi-scale features, often failing to meet detection requirements. This study proposes RMRN-DETR, an optimized remote sensing image detection network based on multi-dimensional real-time detection and domain adaptation. First, we introduce a Multi-dimensional Real-time detection module (MR) to achieve efficient end-to-end accuracy improvement. Second, a Multi-dimensional Domain Adaptation module is proposed to address feature fusion across different scales, effectively capturing both low-level and high-level semantic information in a multi-scale hierarchy. Finally, a novel loss boundary regression module is introduced to enhance bounding box regression accuracy, precisely reflecting the discrepancy between predicted and ground-truth boxes. Experimental results demonstrate a 1.8% accuracy improvement over the baseline on the ROSD dataset and a 2.9% gain on the DIOR dataset. The proposed method significantly enhances the detection accuracy and efficiency of small objects in remote sensing images, demonstrating strong adaptability to complex multi-scale scenarios.
- Conference Article
6
- 10.1109/iecon49645.2022.9968754
- Oct 17, 2022
Application of domain adaptation techniques to predictive maintenance of modern electric rotating machinery (RM) has significant potential with the goal of transferring or adaptation of a fault diagnosis model developed for one machine to be generalized on new machines and/or new working conditions. The generalized nonlinear extension of conventional convolutional neural networks (CNNs), the self-organized operational neural networks (Self-ONNs) are known to enhance the learning capability of CNN by introducing non-linear neuron models and further heterogeneity in the network configuration. In this study, first the state-of-the-art 1D CNNs and Self-ONNs are tested for cross-domain performance. Then, we propose to utilize Self-ONNs as feature extractor in the well-known domain-adversarial neural networks (DANN) to enhance its domain adaptation performance. Experimental results over the benchmark Case Western Reserve University (CWRU) real vibration data set for bearing fault diagnosis across different load domains demonstrate the effectiveness and feasibility of the proposed domain adaptation approach with similar computational complexity.
- Research Article
2
- 10.3390/s23229049
- Nov 8, 2023
- Sensors (Basel, Switzerland)
We present a novel architecture designed to enhance the detection of Error Potential (ErrP) signals during ErrP stimulation tasks. In the context of predicting ErrP presence, conventional Convolutional Neural Networks (CNNs) typically accept a raw EEG signal as input, encompassing both the information associated with the evoked potential and the background activity, which can potentially diminish predictive accuracy. Our approach involves advanced Single-Trial (ST) ErrP enhancement techniques for processing raw EEG signals in the initial stage, followed by CNNs for discerning between ErrP and NonErrP segments in the second stage. We tested different combinations of methods and CNNs. As far as ST ErrP estimation is concerned, we examined various methods encompassing subspace regularization techniques, Continuous Wavelet Transform, and ARX models. For the classification stage, we evaluated the performance of EEGNet, CNN, and a Siamese Neural Network. A comparative analysis against the method of directly applying CNNs to raw EEG signals revealed the advantages of our architecture. Leveraging subspace regularization yielded the best improvement in classification metrics, at up to 14% in balanced accuracy and 13.4% in F1-score.
- Research Article
104
- 10.1007/s00330-020-07274-x
- Oct 1, 2020
- European Radiology
To apply deep learning algorithms using a conventional convolutional neural network (CNN) and a recurrent CNN to differentiate three breast cancer molecular subtypes on MRI. A total of 244 patients were analyzed, 99 in training dataset scanned at 1.5 T and 83 in testing-1 and 62 in testing-2 scanned at 3 T. Patients were classified into 3 subtypes based on hormonal receptor (HR) and HER2 receptor: (HR+/HER2-), HER2+, and triple negative (TN). Only images acquired in the DCE sequence were used in the analysis. The smallest bounding box covering tumor ROI was used as the input for deep learning to develop the model in the training dataset, by using a conventional CNN and the convolutional long short-term memory (CLSTM). Then, transfer learning was applied to re-tune the model using testing-1(2) and evaluated in testing-2(1). In the training dataset, the mean accuracy evaluated using tenfold cross-validation was higher by using CLSTM (0.91) than by using CNN (0.79). When the developed model was applied to the independent testing datasets, the accuracy was 0.4-0.5. With transfer learning by re-tuning parameters in testing-1, the mean accuracy reached 0.91 by CNN and 0.83 by CLSTM, and improved accuracy in testing-2 from 0.47 to 0.78 by CNN and from 0.39 to 0.74 by CLSTM. Overall, transfer learning could improve the classification accuracy by greater than 30%. The recurrent network using CLSTM could track changes in signal intensity during DCE acquisition, and achieved a higher accuracy compared with conventional CNN during training. For datasets acquired using different settings, transfer learning can be applied to re-tune the model and improve accuracy. • Deep learning can be applied to differentiate breast cancer molecular subtypes. • The recurrent neural network using CLSTM could track the change of signal intensity in DCE images, and achieved a higher accuracy compared with conventional CNN during training. • For datasets acquired using different scanners with different imaging protocols, transfer learning provided an efficient method to re-tune the classification model and improve accuracy.
- Research Article
5
- 10.3390/s23020746
- Jan 9, 2023
- Sensors (Basel, Switzerland)
Object detection and tracking is one of the key applications of wireless sensor networks (WSNs). The key issues associated with this application include network lifetime, object detection and localization accuracy. To ensure the high quality of the service, there should be a trade-off between energy efficiency and detection accuracy, which is challenging in a resource-constrained WSN. Most researchers have enhanced the application lifetime while achieving target detection accuracy at the cost of high node density. They neither considered the system cost nor the object localization accuracy. Some researchers focused on object detection accuracy while achieving energy efficiency by limiting the detection to a predefined target trajectory. In particular, some researchers only focused on node clustering and node scheduling for energy efficiency. In this study, we proposed a mobile object detection and tracking framework named the Energy Efficient Object Detection and Tracking Framework (EEODTF) for heterogeneous WSNs, which minimizes energy consumption during tracking while not affecting the object detection and localization accuracy. It focuses on achieving energy efficiency via node optimization, mobile node trajectory optimization, node clustering, data reporting optimization and detection optimization. We compared the performance of the EEODTF with the Energy Efficient Tracking and Localization of Object (EETLO) model and the Particle-Swarm-Optimization-based Energy Efficient Target Tracking Model (PSOEETTM). It was found that the EEODTF is more energy efficient than the EETLO and PSOEETTM models.
- Conference Article
1
- 10.23919/ccc52363.2021.9550159
- Jul 26, 2021
In view of the difficulty and low accuracy of small object detection for remote sensing images, this paper proposes a small object detection algorithm based on contextual information fusion to solve the problem of real-time detection accuracy of small object. In this paper, we use bottom-up VGG16 network to realize multi-scale feature extraction to deal with the problem of insufficient image feature extraction. To direct at the problem that the feature information of each feature layer is single, the shallow feature layer and the deep feature layer are fused through the feature fusion module, which achieves the purpose that some feature layers have more abundant fusion features in the structure level. Aiming at the problem that the detection objects in remote sensing images are mainly small and medium-sized objects, this paper proposes to use the multivariate information of four different scale feature layers for classification prediction and regression prediction, so as to reduce the complexity of network model. The experimental results show that the proposed small object detection algorithm based on the fusion of four scale deep and shallow contextual information can obtain good accuracy and real-time performance on the NWPU VHR-10 dataset, improve the detection accuracy on the basis of ensuring the real-time detection, and perform well in the small object detection task of remote sensing images.
- Research Article
23
- 10.3390/rs15112728
- May 24, 2023
- Remote Sensing
In remote sensing images, small objects have too few discriminative features, are easily confused with background information, and are difficult to locate, leading to a degradation in detection accuracy when using general object detection networks for aerial images. To solve the above problems, we propose a remote sensing small object detection network based on the attention mechanism and multi-scale feature fusion, and name it AMMFN. Firstly, a detection head enhancement module (DHEM) was designed to strengthen the characterization of small object features through a combination of multi-scale feature fusion and attention mechanisms. Secondly, an attention mechanism based channel cascade (AMCC) module was designed to reduce the redundant information in the feature layer and protect small objects from information loss during feature fusion. Then, the Normalized Wasserstein Distance (NWD) was introduced and combined with Generalized Intersection over Union (GIoU) as the location regression loss function to improve the optimization weight of the model for small objects and the accuracy of the regression boxes. Finally, an object detection layer was added to improve the object feature extraction ability at different scales. Experimental results from the Unmanned Aerial Vehicles (UAV) dataset VisDrone2021 and the homemade dataset show that the AMMFN improves the APs values by 2.4% and 3.2%, respectively, compared with YOLOv5s, which represents an effective improvement in the detection accuracy of small objects.
- Research Article
21
- 10.1109/jsen.2021.3103612
- Oct 1, 2021
- IEEE Sensors Journal
In recent years, object detection algorithm based on deep learning has made great progress, but the detection effect is not ideal for small objects detection. Some methods use high-resolution features or enhance shallow features to improve the detection accuracy of small objects. However, using high-resolution features for detection needs higher computational cost, and enhancing shallow features by propagating semantic information from high-level into low-level may bring information aliasing. To address this issue, we propose a novel object detection method based on shallow feature fusion and semantic information enhancement (FFSI). The high-level semantic information is injected into low-level features to guide the enhancement of specific detail information. In order to reduce the information aliasing in shallow features and enhance the receptive field of shallow features, we design two parallel modules: context information enhancement module (CIE) and receptive field enhancement module (RFE). CIE highlights the location of objects by establishing the relationship between local and global context information. RFE enhances the receptive field of shallow features by using dilated convolution to adapt to object detection of different scales, especially small objects. The proposed model is evaluated extensively on PASCAL VOC, and COCO datasets. The experimental results demonstrate that the proposed FFSI model has competitive performance. More importantly, this study reveals that FFSI outperforms the state-of-the-art methods in detecting small objects.
- Research Article
- 10.3390/s24061804
- Mar 11, 2024
- Sensors
Perception plays a crucial role in ensuring the safety and reliability of autonomous driving systems. However, the recognition and localization of small objects in complex scenarios still pose challenges. In this paper, we propose a point cloud object detection method based on dynamic sparse voxelization to enhance the detection performance of small objects. This method employs a specialized point cloud encoding network to learn and generate pseudo-images from point cloud features. The feature extraction part uses sliding windows and transformer-based methods. Furthermore, multi-scale feature fusion is performed to enhance the granularity of small object information. In this experiment, the term "small object" refers to objects such as cyclists and pedestrians, which have fewer pixels compared to vehicles with more pixels, as well as objects of poorer quality in terms of detection. The experimental results demonstrate that, compared to the PointPillars algorithm and other related algorithms on the KITTI public dataset, the proposed algorithm exhibits improved detection accuracy for cyclist and pedestrian target objects. In particular, there is notable improvement in the detection accuracy of objects in the moderate and hard quality categories, with an overall average increase in accuracy of about 5%.
- Research Article
8
- 10.1109/access.2021.3083804
- Jan 1, 2021
- IEEE Access
A deep learning model trained under a specific operating condition of the gearbox often experiences an overfitting problem, which makes it impossible to diagnose faults under different operating conditions. To solve this problem, this paper proposes an ensemble of deep domain adaptation approaches with a health data map. As a fundamental approach to alleviate the domain shift problem due to inhomogeneous operating conditions, the vibration signal is transformed into an image-like simplified health data map that visualizes a tooth-wise fault of the gearbox. The simplified health data map enables the use of a conventional convolutional neural network (CNN) model. To solve the remaining domain shift problem even with the simplified health data map, this study employs a maximum classifier discrepancy (MCD), which is a typical domain adaptation method. To further enhance its performance, a discrepancy-scale factor-based MCD and its ensemble approach are proposed. The proposed method is demonstrated with a 2 kW planetary gearbox testbed operated under stationary and non-stationary speed conditions. The results present that the proposed method outperforms conventional CNN and MCD even under the inhomogeneous operating condition of the gearbox.
- Research Article
- 10.1145/3665649
- Sep 16, 2024
- ACM Journal on Autonomous Transportation Systems
With the technology advances, deep learning-based object detection has made unprecedented progress. However, the small spatial ratio of object pixels affects the effective extraction of deep details features, resulting in poor detection results in small object detection. To improve the accuracy of small object detection, an adaptive Cascading Context small (ACC) object detection method is proposed based on YOLOv5. Firstly, a separate shallow layer feature was proposed to obtain more detailed information beneficial to small object detection. Secondly, an adaptive cascade method is proposed to fuse the output features of the three layers of the pyramid to adaptively filter negative semantic information, while fusing with shallow features to solve the problem of low classification accuracy caused by insufficient semantic information of shallow features. Finally, an adaptive context model is proposed to use a deformable convolution to obtain spatial context features of shallow small objects, associating the targets with the background, thereby improving the accuracy of small object detection. The experimental results show that the detection accuracy of the proposed method has been improved by 6.12%, 3.35%, 3.33%, and 5.2%, respectively, compared with the source code on the PASCAL VOC, NWPU VHR-10, KITTI, and RSOD datasets, which fully demonstrate the effectiveness of our method in small object detection.
- Conference Article
1
- 10.1109/apccas51387.2021.9687715
- Nov 22, 2021
The accuracy of visual object detection, which es-timates locations and classes of target objects in input images, has been drastically improved by the rapid advancements of deep convolutional neural networks (CNNs). The evaluation of existing methods based on CNNs is usually conducted using major datasets such as MS-COCO and PASCAL-VOC, and these datasets include several sizes of target objects. The accuracy of detecting of larger objects has become remarkable via recent methods; however, it remains difficult for recent CNNs to accurately detect small objects. To address this problem, this study investigates how to improve the accuracy of small object detection using CNNs. For the investigation, two types of datasets that solely comprise small target objects were created: the Bird and SAVMAP datasets that solely include flying objects in the sky and mammals in the savannah, respectively. Experimental results obtained with the datasets indicate that the input size, depth of CNN layers, and surrounding context of target objects were important factors for small object detection. Furthermore, these results demonstrate that EfficientDet-DO achieved accuracies of 0.6585 and 0.6501 for the Bird and SAVMAP datasets, respectively.
- Research Article
- 10.3390/make7030064
- Jul 9, 2025
- Machine Learning and Knowledge Extraction
The underground coal mine environment is complex and dynamic, making the application of visual algorithms for object detection a crucial component of underground safety management as well as a key factor in ensuring the safe operation of workers. We look at this in the context of helmet-wearing detection in underground mines, where over 25% of the targets are small objects. To address challenges such as the lack of effective samples for unworn helmets, significant background interference, and the difficulty of detecting small helmet targets, this paper proposes a novel underground helmet-wearing detection algorithm that combines dynamic background awareness with a limited number of valid samples to improve accuracy for underground workers. The algorithm begins by analyzing the distribution of visual surveillance data and spatial biases in underground environments. By using data augmentation techniques, it then effectively expands the number of training samples by introducing positive and negative samples for helmet-wearing detection from ordinary scenes. Thereafter, based on YOLOv10, the algorithm incorporates a background awareness module with region masks to reduce the adverse effects of complex underground backgrounds on helmet-wearing detection. Specifically, it adds a convolution and attention fusion module in the detection head to enhance the model’s perception of small helmet-wearing objects by enlarging the detection receptive field. By analyzing the aspect ratio distribution of helmet wearing data, the algorithm improves the aspect ratio constraints in the loss function, further enhancing detection accuracy. Consequently, it achieves precise detection of helmet-wearing in underground coal mines. Experimental results demonstrate that the proposed algorithm can detect small helmet-wearing objects in complex underground scenes, with a 14% reduction in background false detection rates, and thereby achieving accuracy, recall, and average precision rates of 94.4%, 89%, and 95.4%, respectively. Compared to other mainstream object detection algorithms, the proposed algorithm shows improvements in detection accuracy of 6.7%, 5.1%, and 11.8% over YOLOv9, YOLOv10, and RT-DETR, respectively. The algorithm proposed in this paper can be applied to real-time helmet-wearing detection in underground coal mine scenes, providing safety alerts for standardized worker operations and enhancing the level of underground security intelligence.
- Research Article
21
- 10.3390/rs13132620
- Jul 3, 2021
- Remote Sensing
Accurate object detection is important in computer vision. However, detecting small objects in low-resolution images remains a challenging and elusive problem, primarily because these objects are constructed of less visual information and cannot be easily distinguished from similar background regions. To resolve this problem, we propose a Hierarchical Small Object Detection Network in low-resolution remote sensing images, named HSOD-Net. We develop a point-to-region detection paradigm by first performing a key-point prediction to obtain position hypotheses, then only later super-resolving the image and detecting the objects around those candidate positions. By postponing the object prediction to after increasing its resolution, the obtained key-points are more stable than their traditional counterparts based on early object detection with less visual information. This hierarchical approach, HSOD-Net, saves significant run-time, which makes it more suitable for practical applications such as search and rescue, and drone navigation. In comparison with the state-of-art models, HSOD-Net achieves remarkable precision in detecting small objects in low-resolution remote sensing images.
- Research Article
3
- 10.1016/j.eswa.2022.116973
- Apr 4, 2022
- Expert Systems With Applications
Monocular vision-based time-to-collision estimation for small drones by domain adaptation of simulated images
- Conference Article
- 10.33012/2022.18496
- Oct 20, 2022
Object detection is one of the core tasks of computer vision. With the development of artificial neural networks, object detection has been greatly improved and gradually applied to more fields. In urban traffic, object detection can be used for vehicle detection, autonomous driving, and judging traffic conditions. In the process of navigation, satellite remote sensing images can be used to detect urban vehicles to judge traffic conditions. Object detection can also be used to identify obstacles during car driving. Although accurate detection of larger objects in images has been achieved in many applications, accurate detection of smaller objects remains challenging. The main difficulties in the detection of small targets are low resolution, blurred images, and little information carried. As a result, its feature expression ability is weak, that is, in the process of special zone features, very few features can be extracted, which is not conducive to the detection of small targets. This paper proposes an improved Faster R-CNN algorithm for small target recognition in remote sensing images. In this paper, the feature pyramid structure is used to improve Faster R-CNN. Feature pyramid can perform feature extraction on images of each scale, and can generate multi-scale feature representation, and feature maps of all levels have strong semantic information, even including some high-resolution feature maps. Through the feature pyramid structure, the feature expression ability is enhanced, and the small target feature map resolution is increased at the same time. Secondly, the kmeans clustering transformation algorithm is used to optimize the size of the anchor box to improve the matching degree between the prior box and the ground-truth box. And a channel attention mechanism is introduced in feature fusion to highlight important features and reduce redundant features. Remote sensing images are generally obtained by space platforms, and most of the targets are small targets. The attention mechanism considers both the inter-channel relationship and the spatial position relationship, which enables the model to more accurately identify the target and lock the target position. The attention mechanism improves the accuracy and generalization of small object detection. Finally, we validate the model on RSOD-dataset. Through the experimental verification, the framework proposed in this paper can effectively improve the accuracy of small target detection in aerial remote sensing images, and effectively reduce the false alarm rate, which provides a basis for the research of target detection in aerial remote sensing images.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.