Generative Adversarial Networks In Object Detection: A Systematic Literature Review
The intersection of Generative Adversarial Networks (GANs) and object detection represents one of the most promising developments in modern computer vision, offering innovative solutions to longstanding challenges in visual recognition systems. This review presents a systematic analysis of how GANs are transforming these challenges, examining their applications from 2020 to 2025. The paper investigates three primary domains where GANs have demonstrated remarkable potential: data augmentation for addressing data scarcity, occlusion handling techniques designed to manage visually obstructed objects, and enhancement methods specifically focused on improving small object detection performance. Analysis reveals significant performance improvements resulting from these GAN applications: data augmentation methods consistently boost detection metrics such as mAP and F1-score on scarce datasets, occlusion handling techniques successfully reconstruct hidden features with high PSNR and SSIM values, and small object detection techniques increase detection accuracy by up to 10% Average Precision in some studies. Collectively, these findings demonstrate how GANs, integrated with modern detectors, are greatly advancing object detection capabilities. Despite this progress, persistent challenges including computational cost and training stability remain. By critically analyzing these advancements and limitations, this paper provides crucial insights into the current state and potential future developments of GAN-based object detection systems.
- Research Article
16
- 10.3390/s21155194
- Jul 31, 2021
- Sensors (Basel, Switzerland)
Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can significantly improve the performance of small object detection. This paper presents a centered multi-task generative adversarial network (CMTGAN), which combines small object detection and gaze estimation. To achieve this, we propose a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection. We exploit a generator in CMTGAN for image super-resolution and a discriminator for object detection. We introduce an artificial texture loss into the generator to retain the original feature of small objects. We also use a centered mask in the generator to make the network focus on the central part of images where small objects are more likely to appear in our method. We propose a discriminator with detection loss for two-stage small object detection, which can be adapted to other GANs for object detection. Compared with existing interpolation methods, the super-resolution images generated by CMTGAN are more explicit and contain more information. Experiments show that our method exhibits a better detection performance than mainstream methods.
- Research Article
16
- 10.1016/j.asoc.2023.110224
- Mar 23, 2023
- Applied Soft Computing
Current developments in object tracking and detection techniques have directed remarkable improvements in distinguishing attacks and adversaries. Nevertheless, adversarial attacks, intrusions, and manipulation of images/ videos threaten video surveillance systems and other object-tracking applications. Generative adversarial neural networks (GANNs) are widely used image processing and object detection techniques because of their flexibility in processing large datasets in real-time. GANN training ensures a tamper-proof system, but the plausibility of attacks persists. Therefore, reviewing object tracking and detection techniques under GANN threats is necessary to reveal the challenges and benefits of efficient defence methods against these attacks. This paper aims to systematically review object tracking and detection techniques under threats to GANN-based applications. The selected studies were based on different factors, such as the year of publication, the method implemented in the article, the reliability of the chosen algorithms, and dataset size. Each study is summarised by assigning it to one of the two predefined tasks: applying a GANN or using traditional machine learning (ML) techniques. First, the paper discusses traditional applied techniques in this field. Second, it addresses the challenges and benefits of object detection and tracking. Finally, different existing GANN architectures are covered to justify the need for tamper-proof object tracking systems that can process efficiently in a real-time environment.
- Research Article
52
- 10.1109/tip.2022.3207571
- Jan 1, 2022
- IEEE Transactions on Image Processing
Model-based single image dehazing algorithms restore haze-free images with sharp edges and rich details for real-world hazy images at the expense of low PSNR and SSIM values for synthetic hazy images. Data-driven ones restore haze-free images with high PSNR and SSIM values for synthetic hazy images but with low contrast, and even some remaining haze for real-world hazy images. In this paper, a novel single image dehazing algorithm is introduced by combining model-based and data-driven approaches. Both transmission map and atmospheric light are first estimated by the model-based methods, and then refined by dual-scale generative adversarial networks (GANs) based approaches. The resultant algorithm forms a neural augmentation which converges very fast while the corresponding data-driven approach might not converge. Haze-free images are restored by using the estimated transmission map and atmospheric light as well as the Koschmieder's law. Experimental results indicate that the proposed algorithm can remove haze well from real-world and synthetic hazy images.
- Research Article
81
- 10.1109/access.2021.3131949
- Jan 1, 2021
- IEEE Access
Anomaly detection has become an indispensable tool for modern society, applied in a wide range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many anomaly detection techniques have been introduced. However, in general, they all suffer from the same problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn, makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability to generate new data. In this paper, we present a systematic review of the literature in this area, covering 128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based anomaly detection, and to assemble information on datasets and performance metrics used to assess them. Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection technique for their application. In addition, we present a research roadmap for future studies in this area. In summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the anomalous behaviour, either through data augmentation or representation learning. The most commonly used GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include medicine, surveillance and intrusion detection.
- Research Article
- 10.26583/sv.16.2.11
- May 1, 2024
- Scientific Visualization
When detecting equipment on a construction site the objects of detection could have very different scale relative to the image on which they are located. For better detection and bounding box visualization of small objects, a Feature-Fused modification of the SSD detector can be used. Together with the use of overlapping image slicing on the inference, this model copes well with the detection of small objects. However, excessive manual adjustment of the slicing parameters for better detection of small objects can both generally worsen detection on scenes different from those on which the model was adjusted, and lead to significant losses in the detection of large objects and problems with their bound-ing box visualization. Therefore, to achieve the best quality, the image slicing parameters should be automatically selected by the model depending on the characteristic scales of objects in the image. The article presents a dual-pass version of Feature-Fused SSD for automatic determination of image slicing parameters. To determine the characteristic sizes of detected objects on the first pass, a fast truncated version of the detector is used. On the second pass the final object detection is carried out with slicing parameters selected after the first one. Depending on the complexity of the task being solved, the detector demonstrates a quality of 0.82 - 0.92 according to the mAP (mean Average Precision) metric.
- Research Article
- 10.1088/1361-6501/adf136
- Aug 1, 2025
- Measurement Science and Technology
Small object detection presents various challenges across different domains, with UAV aerial image detection being particularly significant and complex. The detection accuracy is primarily influenced by the high density of small objects, substantial object scale variations and background complexity. Nevertheless, existing object detection algorithms exhibit deficiencies in feature retention and multi-scale feature fusion, thereby limiting detection performance in intricate scenes. To address these challenges, this paper proposes an innovative multi-dimensional feature enhancement and multi-scale feature adaptive aggregation and diffusion small object detection network (MFEAD-SODNet) for UAV aerial images. First, a backbone network integrating edge and spatial feature enhancement is developed to enhance feature representation from multiple perspectives, which improves small object recognition accuracy and detection performance. Second, the multi-scale feature adaptive aggregation and diffusion feature pyramid network (MFAD-FPN) is innovatively introduced. This network effectively preserves multi-scale information through adaptive feature fusion driven by channel selection. Additionally, it employs a cross-layer feature aggregation and adjacent layer feature diffusion mechanism to shorten feature transfer paths and minimize information propagation loss. Finally, a Lightweight shared detail-enhanced detection head is proposed to balance computational complexity while enhancing detailed feature representation. To evaluate the effectiveness of the proposed algorithm, experiments were conducted using VisDrone2019 as the baseline dataset. Results indicate that, compared to the baseline model, MFEAD-SODNet improves Mean Average Precision (mAP)@0.5 and mAP@0.5:0.95 by 7.6% and 5.1 %, respectively, while reducing the number of parameters by 23.3 %. Furthermore, the effectiveness and generalization of the MFEAD-SODNet model for small object detection were further validated using additional public and self-built datasets.
- Research Article
- 10.30574/wjaets.2024.13.2.0559
- Dec 30, 2024
- World Journal of Advanced Engineering Technology and Sciences
CNNs and GANs, together and separately, achieve groundbreaking developments in artificial intelligence while they play prominent roles as deep learning structures. This document is a rather extensive overview and side-by-side analysis of CNNs and GANs and their back-end architectures and workings, as well as their advantages and disadvantages and uses in practice. Convolutional Neural Networks (CNNs), known for their outstanding feature extraction capabilities, have greatly boosted up the scope of image classification, object detection, and medical diagnostics; Generative Adversarial Networks (GANs) have brought a new generalized approach to generative modelling, generating extremely realistic images, videos, and data. This analysis highlights significant differences in the training of CNNs and GANs, intricacy of the latter two’s architectures, and metrics used to measure performance, as well as recurrent challenges such as overfitting in CNNs and instability in GANs. Furthermore, the paper explores how these models can be coupled to form hybrid systems and perform better in such applications as data augmentation and image translation. This paper will attempt to provide an in-depth review of these models to give researchers and practitioners a clear spectacle to use these models across various applications and determine areas that future research can be directed.
- Research Article
62
- 10.1007/s13735-020-00196-w
- Oct 24, 2020
- International Journal of Multimedia Information Retrieval
Deep neural networks have attained great success in handling high dimensional data, especially images. However, generating naturalistic images containing ginormous subjects for different tasks like image classification, segmentation, object detection, reconstruction, etc., is continued to be a difficult task. Generative modelling has the potential to learn any kind of data distribution in an unsupervised manner. Variational autoencoder (VAE), autoregressive models, and generative adversarial network (GAN) are the popular generative modelling approaches that generate data distributions. Among these, GANs have gained much attention from the research community in recent years in terms of generating quality images and data augmentation. In this context, we collected research articles that employed GANs for solving various tasks from popular databases and summarized them based on their application. The main objective of this article is to present the nuts and bolts of GANs, state-of-the-art related work and its applications, evaluation metrics, challenges involved in training GANs, and benchmark datasets that would benefit naive and enthusiastic researchers who are interested in working on GANs.
- Research Article
63
- 10.1109/tcsvt.2020.2967419
- Jan 23, 2020
- IEEE Transactions on Circuits and Systems for Video Technology
Deep learning has revolutionized the performance of classification and object detection, but meanwhile demands sufficient labeled data for training. Given insufficient data, while many techniques have been developed to help combat overfitting, the challenge remains if one tries to train deep networks, especially in the ill-posed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">extremely low data regimes</i> : only a small set of labeled data are available, and nothing – including unlabeled data – else. Such regimes arise from practical situations where not only data labeling but also data collection itself is expensive. We propose a deep adversarial data augmentation (DADA) technique to address the problem, in which we elaborately formulate data augmentation as a problem of training a class-conditional and supervised generative adversarial network (GAN). Specifically, a new discriminator loss is proposed to fit the goal of data augmentation, through which both real and augmented samples are enforced to contribute to and be consistent in finding the decision boundaries. Tailored training techniques are developed accordingly. To quantitatively validate its effectiveness, we first perform extensive simulations to show that DADA substantially outperforms both traditional data augmentation and a few GAN-based options. We then extend experiments to three real-world small labeled classification datasets where existing data augmentation and/or transfer learning strategies are either less effective or infeasible. We also demonstrate that DADA to can be extended to the detection task. We improve the pedestrian synthesis work by substitute for our discriminator and training scheme. Validation experiment shows that DADA can improve the detection mean average precision (mAP) compared with some traditional data augmentation techniques in object detection. Source code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/SchafferZhang/DADA</uri> .
- Book Chapter
1
- 10.1007/978-3-030-03766-6_76
- Dec 25, 2018
In the small object detection on the UAV (Unmanned Aerial Vehicle) platform, the confidence description of the moving object is proposed to improve the accuracy, robustness and reliable tracking method of the object detection. Due to the low resolution and slow motion of small moving object in aerial video, and the image is easily subject to illumination and camera jitter noise, and the correlation between video sequences is neglected, it is prone to false detection of moving object and low detection accuracy, the characteristics of poor robustness. For the UAV video with small moving object, the algorithm uses the ORB operator to extract reliable global feature points for each frame of the video, and then performs global motion compensation on the motion background through the affine transformation model and calculates the difference image. The energy accurately detects the small object, and then describes the confidence of the moving object. The n-step back-off method is used to increase the correlation information between the video sequences. The proposed method is to evaluate the video captured on the airborne aircraft, and has done a lot of experiments and tests. For the object as small as 25 pixels, the method still has better performance, and our method can be realized by parallel computing. Real-time, processing 1280 × 720 frames at around 45 fps.
- Research Article
- 10.14313/par_249/85
- Sep 16, 2023
- Pomiary Automatyka Robotyka
In recent years, thanks to the development of Deep Learning methods, there has been significant progress in object detection and other computer vision tasks. While generic object detection is becoming less of an issue for modern algorithms, with the Average Precision for medium and large objects in the COCO dataset approaching 70 and 80 percent, respectively, small object detection still remains an unsolved problem. Limited appearance information, blurring, and low signal-to-noise ratio cause state-of-the-art general detectors to fail when applied to small objects. Traditional feature extractors rely on downsampling, which can cause the smallest objects to disappear, and standard anchor assignment methods have proven to be less effective when used to detect low-pixel instances. In this work, we perform an exhaustive review of the literature related to small and tiny object detection. We aggregate the definitions of small and tiny objects, distinguish between small absolute and small relative sizes, and highlight their challenges. We comprehensively discuss datasets, metrics, and methods dedicated to small and tiny objects, and finally, we make a quantitative comparison on three publicly available datasets.
- Research Article
2
- 10.2174/2352096514666211026143543
- Dec 23, 2021
- (Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)
Background: Power line inspection is essential to ensure the safe and stable operation of the power system. Object detection for tower equipment can significantly improve inspection efficiency. However, due to the low resolution of small targets and limited features, the detection accuracy of small targets is not easy to improve. Objective: This study aimed to improve the tiny targets’ resolution while making the small target's texture and detailed features more prominent to be perceived by the detection model. Methods: In this paper, we propose an algorithm that employs generative adversarial networks to improve small objects' detection accuracy. First, the original image is converted into a superresolution one by a super-resolution reconstruction network (SRGAN). Then the object detection framework Faster RCNN is utilized to detect objects on the super-resolution images. Results: The experimental results on two small object recognition datasets show that the model proposed in this paper has good robustness. It can especially detect the targets missed by Faster RCNN, which indicates that SRGAN can effectively enhance the detailed information of small targets by improving the resolution. Conclusion: We found that higher resolution data is conducive to obtaining more detailed information of small targets, which can help the detection algorithm achieve higher accuracy. The small object detection model based on the generative adversarial network proposed in this paper is feasible and more efficient. Compared with Faster RCNN, this model has better performance on small object detection.
- Research Article
- 10.1007/s10462-025-11186-x
- Mar 19, 2025
- Artificial Intelligence Review
Context is an important factor in computer vision as it offers valuable information to clarify and analyze visual data. Utilizing the contextual information inherent in an image or a video can improve the precision and effectiveness of object detectors. For example, where recognizing an isolated object might be challenging, context information can improve comprehension of the scene. This study explores the impact of various context-based approaches to object detection. Initially, we investigate the role of context in object detection and survey it from several perspectives. We then review and discuss the most recent context-based object detection approaches and compare them. Finally, we conclude by addressing research questions and identifying gaps for further studies. More than 265 publications are included in this survey, covering different aspects of context in different categories of object detection, including general object detection, video object detection, small object detection, camouflaged object detection, zero-shot, one-shot, and few-shot object detection. This literature review presents a comprehensive overview of the latest advancements in context-based object detection, providing valuable contributions such as a thorough understanding of contextual information and effective methods for integrating various context types into object detection, thus benefiting researchers.
- Research Article
1
- 10.1038/s41598-025-86949-1
- Jan 22, 2025
- Scientific Reports
Underwater images collected are often of low clarity and suffer from severe color distortion due to the marine environment and Illumination conditions. This directly impacts tasks such as marine ecological monitoring and underwater target detection, which rely on image processing. Therefore, enhancing Underwater images to improve their quality is necessary. A generative adversarial network with an encoder-decoder structure is proposed to improve the quality of Underwater images. The network consists of a generative network and an adversarial network. The generative network is responsible for enhancing the images, while the adversarial network determines whether the input is an enhanced image or a real high-quality image. In the generative network, we first design a residual convolution module to extract more texture and edge information from underwater images. Next, we design a multi-scale dilated convolution module to capture underwater features at different scales. Then, we design a feature fusion adaptive attention module to reduce the interference of redundant features and enhance the local perception capabilities. Finally, we construct the generative network using these modules along with conventional modules. In the adversarial network, we first design a multi-scale feature extraction module to improve the feature extraction ability. We then use the multi-scale feature extraction module along with conventional convolution modules to design the adversarial network. Additionally, we propose an improved loss function by introducing color loss into the conventional loss function. The improved loss function can better measure the color discrepancy between the enhanced image and the real image. It is useful to reduce color distortion in the enhanced images. In experimental simulations, the images enhanced by the proposed methods have the highest PSNR, SSIM, and UIQM values, indicating that the proposed method has superior Underwater image enhancement capabilities compared to other methods.
- Research Article
125
- 10.1016/j.imavis.2022.104471
- Jul 1, 2022
- Image and Vision Computing
Deep learning-based detection from the perspective of small or tiny objects: A survey
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.