Detecting anti-forensic deepfakes with identity-aware multi-branch networks
Deepfake detection systems have achieved impressive accuracy on conventional forged images; however, they remain vulnerable to anti-forensic or adversarial samples deliberately crafted to evade detection. Such samples introduce imperceptible perturbations that conceal forgery artifacts, causing traditional binary classifiers—trained solely on real and forged data—to misclassify them as authentic. In this paper, we address this challenge by proposing a multi-channel feature extraction framework combined with a three-class classification strategy. Specifically, one channel focuses on extracting identity-preserving facial representations to capture inconsistencies in personal identity traits, while additional channels extract complementary spatial and frequency domain features to detect subtle forgery traces. These multi-channel features are fused and fed into a three-class detector capable of distinguishing real, forged, and anti-forensic samples. Experimental results on datasets incorporating adversarial deepfakes demonstrate that our method substantially improves robustness against anti-forensic attacks while maintaining high accuracy on conventional deepfake detection tasks.
- Research Article
- 10.3390/jimaging11090312
- Sep 12, 2025
- Journal of imaging
Currently, deepfake detection has garnered widespread attention as a key defense mechanism against the misuse of deepfake technology. However, existing deepfake detection networks still face challenges such as insufficient robustness, limited generalization capabilities, and a single feature extraction domain (e.g., using only spatial domain features) when confronted with evolving algorithms or diverse datasets, which severely limits their application capabilities. To address these issues, this study proposes a deepfake detection network named EFIMD-Net, which enhances performance by strengthening feature interaction and integrating spatial and frequency domain features. The proposed network integrates a Cross-feature Interaction Enhancement module (CFIE) based on cosine similarity, which achieves adaptive interaction between spatial domain features (RGB stream) and frequency domain features (SRM, Spatial Rich Model stream) through a channel attention mechanism, effectively fusing macro-semantic information with high-frequency artifact information. Additionally, an Enhanced Multi-scale Feature Fusion (EMFF) module is proposed, which effectively integrates multi-scale feature information from various layers of the network through adaptive feature enhancement and reorganization techniques. Experimental results show that compared to the baseline network Xception, EFIMD-Net achieves comparable or even better Area Under the Curve (AUC) on multiple datasets. Ablation experiments also validate the effectiveness of the proposed modules. Furthermore, compared to the baseline traditional two-stream network Locate and Verify, EFIMD-Net significantly improves forgery detection performance, with a 9-percentage-point increase in Area Under the Curve on the CelebDF-v1 dataset and a 7-percentage-point increase on the CelebDF-v2 dataset. These results fully demonstrate the effectiveness and generalization of EFIMD-Net in forgery detection. Potential limitations regarding real-time processing efficiency are acknowledged.
- Research Article
15
- 10.1109/thms.2022.3163211
- Aug 1, 2022
- IEEE Transactions on Human-Machine Systems
Emotion recognition from macroexpression and microexpression has been widely used in applications such as human–computer interaction, learning status evaluation, and mental disorder diagnosis. However, due to the complexity of human macroexpressions, recognizing macroexpressions with high accuracy is a challenging task. Moreover, the short duration and low movement intensity of microexpressions make its recognition more difficult. For MM-FER (macro and microfacial expression recognition), the key information can be more efficiently expressed by a graph. In this article, a novel framework based on graph neural network named SSGNN (spatial and spectral domain features based on a graph neural network) is designed to extract spatial and spectral domain features from facial images for MM-FER, which can efficiently recognize both macroexpressions and microexpressions under the same model. SSGNN consists of two parts, SPAGNN and SPEGNN, which are used to extract spectral and spatial domain features, respectively. Experiments proved that jointly using the spectral and spatial information extracted by SSGNN can largely improve the performance of MM-FER when the training sample is limited. First, the influences of different neighbors and samples to the model performance was analyzed. Then, the contribution of SPAGNN and SPEGNN were evaluated. It was discovered that fusing the result of SPAGNN and SPEGNN at decision level further improved the performance of MM-FER. Experiment proved that SSGNN can recognize microexpression acquired by various sensors with higher accuracy under different image resolutions and image formats than the compared state-of-the-art methods in most cases. A cross-dataset experiment demonstrated the generalization ability of SSGNN.
- Research Article
- 10.3390/app15041968
- Feb 13, 2025
- Applied Sciences
Road extraction is a key task in the field of remote sensing image processing. Existing road extraction methods primarily leverage spatial domain features of remote sensing images, often neglecting the valuable information contained in the frequency domain. Spatial domain features capture semantic information and accurate spatial details for different categories within the image, while frequency domain features are more sensitive to areas with significant gray-scale variations, such as road edges and shadows caused by tree occlusions. To fully extract and effectively fuse spatial and frequency domain features, we propose a Cross-Domain Feature Fusion Network (CDFFNet). The framework consists of three main components: the Atrous Bottleneck Pyramid Module (ABPM), the Frequency Band Feature Separator (FBFS), and the Domain Fusion Module(DFM). First, the FBFS is used to decompose image features into low-frequency and high-frequency components. These components are then integrated with shallow spatial features and deep features extracted through the ABPM. Finally, the DFM is employed to perform spatial–frequency feature selection, ensuring consistency and complementarity between the spatial and frequency domain features. The experimental results on the CHN6_CUG and Massachusetts datasets confirm the effectiveness of CDFFNet.
- Research Article
- 10.1049/bme2/2217175
- Jan 1, 2025
- IET Biometrics
Due to the abuse of deep forgery technology, the research on forgery detection methods has become increasingly urgent. The corresponding relationship between the frequency spectrum information and the spatial clues, which is often neglected by current methods, could be conducive to a more accurate and generalized forgery detection. Motivated by this inspiration, we propose a wavelet‐based texture mining and enhancement framework for face forgery detection. First, we introduce a frequency‐guided texture enhancement (FGTE) module that mining the high‐frequency information to improve the network’s extraction of effective texture features. Next, we propose a global–local feature refinement (GLFR) module to enhance the model’s leverage of both global semantic features and local texture features. Moreover, the interactive fusion module (IFM) is designed to fully incorporate the enhanced texture clues with spatial features. The proposed method has been extensively evaluated on five public datasets, such as FaceForensics++ (FF++), deepfake (DF) detection (DFD) challenge (DFDC), Celeb‐DFv2, DFDC preview (DFDC‐P), and DFD, for face forgery detection, yielding promising performance within and cross dataset experiments.
- Research Article
- 10.1038/s41598-025-34091-3
- Dec 29, 2025
- Scientific reports
Road extraction from remote sensing imagery is essential for urban planning, traffic monitoring, and emergency response. However, existing methods often focus solely on spatial-domain features, limiting their ability to model complex topological structures like narrow or fragmented roads. To address this limitation, we propose a dual-branch framework-DSWFNet-that fuses spatial and frequency domain features for road extraction. The model introduces a frequency-domain branch constructed via Discrete Wavelet Transform (DWT) to complement the RGB-based spatial branch in modeling fine image details. To further enhance feature representations, we design two dedicated attention mechanisms: the Multi-Scale Coordinate Channel Attention (MSCCA) module for spatial features, and the Enhanced Frequency-Domain Channel Attention (EFDCA) module for frequency features. These are followed by a Bidirectional Cross Attention Module (BCAM) that enables deep interaction and fusion of the two feature types, significantly improving the model's sensitivity to road targets and its ability to preserve structural continuity. Experiments on two representative datasets validate the effectiveness of our approach. Specifically, on the Massachusetts dataset, DSWFNet achieves an IoU of 66.07% and an F1 of 79.57%, improving upon the best spatial-domain method, OARENet, by 1.25% and 0.92%. On the CHN6-CUG dataset, performance is further enhanced with an IoU of 70.76% and an F1 of 82.88%, surpassing the leading baseline by 1.64% and 1.13%.
- Research Article
- 10.1049/ipr2.13276
- Nov 14, 2024
- IET Image Processing
In recent years, face forgery detection has gained significant attention, resulting in considerable advancements. However, most existing methods rely on CNNs to extract artefacts from the spatial domain, overlooking the pervasive frequency‐domain artefacts present in deepfake content, which poses challenges in achieving robust and generalized detection. To address these issues, we propose the dual‐stream frequency—spatial fusion network is proposed for deepfake detection. The dual‐stream frequency‐spatial fusion network consists of three components: the spatial forgery feature extraction module, the frequency forgery feature extraction module, and the spatial–frequency feature fusion module. The spatial forgery feature extraction module employs spatial‐channel attention to extract spatial domain features, targeting artefacts in the spatial domain. The frequency forgery feature extraction module leverages the focused linear attention to detect frequency domain anomalies in internal regions, enabling the identification of generated content. The spatial–frequency feature fusion module then fuses forgery features extracted from both the spatial and frequency domains, facilitating accurate detection of splicing artefacts and internally generated forgeries. This approach enhances the model's ability to more accurately capture forgery characteristics. Extensive experiments on several widely‐used benchmarks demonstrate that our carefully designed network exhibits superior generalization and robustness, significantly improving deepfake detection performance.
- Conference Article
1
- 10.1109/cisp-bmei48845.2019.8965677
- Oct 1, 2019
Stereoscopic image quality assessment (SIQA) is an essential and tricky part of image processing. Recently, many scholars have conducted image quality assessment in either spatial or frequency domain respectively. In this paper, we propose a no-reference stereoscopic image quality evaluation method by considering both complex contourlet and spatial features. Firstly, the original views are converted to the Lab color space. Then the luminance channel of Lab color space and the synthetic cyclopean image are calculated from the original stereo pairs. Next, the perceptual features of these types of images are extracted in the complex contour domain and the spatial domain respectively. Finally, all pre-extracted features are sent to the regression model for training and predicting image quality scores. Our experimental results show that the proposed algorithm achieves high consistency with human subjective perception, and compete with state-of-the-art algorithms.
- Research Article
3
- 10.1088/1361-6560/ad5ef3
- Jul 11, 2024
- Physics in Medicine & Biology
Objective. In recent years, convolutional neural networks, which typically focus on extracting spatial domain features, have shown limitations in learning global contextual information. However, frequency domain can offer a global perspective that spatial domain methods often struggle to capture. To address this limitation, we propose FreqSNet, which leverages both frequency and spatial features for medical image segmentation. Approach. To begin, we propose a frequency-space representation aggregation block (FSRAB) to replace conventional convolutions. FSRAB contains three frequency domain branches to capture global frequency information along different axial combinations, while a convolutional branch is designed to interact information across channels in local spatial features. Secondly, the multiplex expansion attention block extracts long-range dependency information using dilated convolutional blocks, while suppressing irrelevant information via attention mechanisms. Finally, the introduced Feature Integration Block enhances feature representation by integrating semantic features that fuse spatial and channel positional information. Main results. We validated our method on 5 public datasets, including BUSI, CVC-ClinicDB, CVC-ColonDB, ISIC-2018, and Luna16. On these datasets, our method achieved Intersection over Union (IoU) scores of 75.46%, 87.81%, 79.08%, 84.04%, and 96.99%, and Hausdorff distance values of 22.22 mm, 13.20 mm, 13.08 mm, 13.51 mm, and 5.22 mm, respectively. Compared to other state-of-the-art methods, our FreqSNet achieves better segmentation results. Significance. Our method can effectively combine frequency domain information with spatial domain features, enhancing the segmentation performance and generalization capability in medical image segmentation tasks.
- Conference Article
2
- 10.1117/12.2049937
- May 29, 2014
It is well-known that a pattern recognition system is only as good as the features it is built upon. In the fields of image processing and computer vision, we have numerous spatial domain and spatial-frequency domain features to extract characteristics of imagery according to its color, shape and texture. However, these approaches extract information across a local neighborhood, or region of interest, which for target detection contains both object(s) of interest and background (surrounding context). A goal of this research is to filter out as much task irrelevant information as possible, e.g., tire tracks, surface texture, etc., to allow a system to place more emphasis on image features in spatial regions that likely belong to the object(s) of interest. Herein, we outline a procedure coined <i>soft</i> feature extraction to refine the focus of spatial domain features. This idea is demonstrated in the context of an explosive hazards detection system using forward looking infrared imagery. We also investigate different ways to spatially contextualize and calculate mathematical features from shearlet filtered candidate image chips. Furthermore, we investigate localization strategies in relation to different ways of grouping image features to reduce the false alarm rate. Performance is explored in the context of receiver operating characteristic curves on data from a U.S. Army test site that contains multiple target and clutter types, burial depths, and times of day.
- Research Article
- 10.47893/ijcsi.2014.1153
- Apr 1, 2014
- International Journal of Computer Science and Informatics
As Many CADx systems have been developed to detect lung cancer based on spatial domain features that process only the pixel intensity values, the proposed scheme applies frequency transform to the lung images to extract frequency domain features and they are combined with spatial features so that the features that are not revealed in spatial domain will be extracted and the classification performance can be tuned up. The proposed CADx comprises of four stages. In the first stage, lung region is segmented using Convexity based active contour segmentation. At second stage ROIs are extracted using spatially constrained KFCM clustering. Followed by standard wavelet transforms is applied on ROI so that transform domain features are extracted with shape and haralick histogram features. Finally neural network is trained by combined feature set to identify the cancerous nodules. Our proposed scheme has shown sensitivity of 95% and specificity of 96%.
- Research Article
7
- 10.1371/journal.pone.0278055
- Dec 30, 2022
- PLOS ONE
Multi-scale image decomposition is crucial for image fusion, extracting prominent feature textures from infrared and visible light images to obtain clear fused images with more textures. This paper proposes a fusion method of infrared and visible light images based on spatial domain and image features to obtain high-resolution and texture-rich images. First, an efficient hierarchical image clustering algorithm based on superpixel fast pixel clustering directly performs multi-scale decomposition of each source image in the spatial domain and obtains high-frequency, medium-frequency, and low-frequency layers to extract the maximum and minimum values of each source image combined images. Then, using the attribute parameters of each layer as fusion weights, high-definition fusion images are through adaptive feature fusion. Besides, the proposed algorithm performs multi-scale decomposition of the image in the spatial frequency domain to solve the information loss problem caused by the conversion process between the spatial frequency and frequency domains in the traditional extraction of image features in the frequency domain. Eight image quality indicators are compared with other fusion algorithms. Experimental results show that this method outperforms other comparative methods in both subjective and objective measures. Furthermore, the algorithm has high definition and rich textures.
- Research Article
14
- 10.1007/s12539-023-00567-x
- May 4, 2023
- Interdisciplinary Sciences, Computational Life Sciences
In view of the major depressive disorder characteristics such as high mortality as well as high recurrence, it is important to explore an objective and effective detection method for major depressive disorder. Considering the advantages complementary of different machine learning algorithms in information mining process, as well as the fusion complementary of different information, in this study, the spatial–temporal electroencephalography fusion framework using neural network is proposed for major depressive disorder detection. Since electroencephalography is a typical time series signal, we introduce recurrent neural network embedded in long short-term memory unit for extract temporal domain features to solve the problem of long-distance information dependence. To reduce the volume conductor effect, the temporal electroencephalography data are mapping into a spatial brain functional network using phase lag index, then the spatial domain features were extracted from brain functional network using 2D convolutional neural networks. Considering the complementarity between different types of features, the spatial–temporal electroencephalography features are fused to achieve data diversity. The experimental results show that spatial–temporal features fusion can improve the detection accuracy of major depressive disorder with a highest of 96.33%. In addition, our research also found that theta, alpha, and full frequency band in brain regions of left frontal, left central, right temporal are closely related to MDD detection, especially theta frequency band in left frontal region.Graphical Only using single-dimension EEG data as decision basis, it is difficult to fully explore the valuable information hidden in the data, which affects the overall detection performance of MDD. Meanwhile, different algorithms have their own advantages for different application scenarios. Ideally, different algorithms should use their respective advantages to jointly address complex problems in engineering fields. To this end, we propose a computer-aided MDD detection framework based on spatial–temporal EEG fusion using neural network, as shown in Fig. 1. The simplified process is as follows: (1) Raw EEG data acquisition and preprocessing. (2) The time series EEG data of each channel are input as recurrent neural network (RNN), and RNN is used to process and extract temporal domain (TD) features. (3) The BFN among different EEG channels is constructed, and CNN is used to process and extract the spatial domain (SD) features of the BFN. (4) Based on the theory of information complementarity, the spatial–temporal information is fused to realize efficient MDD detection.Fig. 1MDD detection framework based on spatial–temporal EEG fusion
- Book Chapter
3
- 10.1007/978-3-319-12640-1_61
- Jan 1, 2014
This paper addresses the challenging problem of face recognition in surveillance conditions based on the recently published database called SCface. This database emphasizes the challenges of face recognition in uncontrolled indoor conditions. In this database, 4160 face images were captured using five different commercial cameras of low resolution, at three different distances, both lighting conditions and face pose were uncontrolled. Moreover, some of the images were taken under night vision mode. This paper introduces a novel feature extraction scheme that combines parameters extracted from both spatial and frequency domains. These features will be referred to as Spatial and Frequency Domains Combined Features (SFDCF). The spatial domain features are extracted using Spatial Deferential Operators (SDO), while the frequency domain features are extracted using Discrete Cosine Transform (DCT). Principal Component Analysis (PCA) is used to reduce the dimensionality of the spatial domain features while zonal coding is used for reducing the dimensionality of the frequency domain features. The two feature sets were simply combined by concatenation to form a feature vector representing the face image. In this paper we provide a comparison, in terms of recognition results, between the proposed features and other typical features; namely, eigenfaces, discrete cosine coefficients, wavelet subband energies, and Gray Level Concurrence Matrix (GLCM) coefficients. The comparison shows that the proposed SFDCF feature set yields superior recognition rates, especially for images captured at far distances and images captured in the dark. The recognition rates using SFDCF reach 99.23% for images captured by different cameras at the same distance. While for images captured at different distances, SFDCF reaches a recognition rate of 93.8%.
- Conference Article
5
- 10.1109/icais50930.2021.9396008
- Mar 25, 2021
Content-Based Image Retrieval (CBIR) has become one of the trending areas of research in computer vision. In traditional CBIR the features in spatial domain, such as color, texture, shape and point features are extracted. It is often considered that apart from the spatial features, the features extracted from the frequency domain of the images can give further information on the features of an image. This paper proposes two novel methods for the purpose of feature extraction from the 2-dimensional Discrete Cosine Transform (DCT) of an image. DCT_256_Zigzag and DCT_256_2×2. These methods take into considerations the lower frequencies in order to determine the features in the frequency domain. The advantage of using the zigzag scanning is to have the maximum low frequency values having Higher Energies comparatively. These two features are combined with two of the existing spatial domain features: Local Binary Patterns (LBP) and Interchannel voting features to generate a global feature vector for an image. For an query image, its feature vector is compared with feature vectors of every other image in the database using dl-distance and the images with least distance is considered most similar image to the query image. To evaluate the efficiency of these two methods, five standard performance measures such as Average Precision Rate (APR), Average Recall Rate (ARR), F-Measure, Average Normalized Modified Retrieval Rank (ANMRR) and Total Minimum Retrieval Epoch (TMRE) are used. Six benchmark image datasets: Core1-1000, Corel -5000, Core1-10000, VisTex, STex, and Color-Brodatz are used to corroborate the performance of these methods.
- Research Article
- 10.1142/s0129156425402104
- Dec 24, 2024
- International Journal of High Speed Electronics and Systems
Aiming at solving the problems of low recognition rate, low recognition efficiency and poor recognition effect in the current dance motion recognition methods that are affected by the surrounding environment, this study proposes a dance action recognition model based on the spatial frequency domain features of contour image. This study uses texture information to extract the dance action contour image, solve the feature vector of the contour image by the hypercomplex Fourier transform, and adopts the phase spectrum and energy spectrum transformation to smooth the contour image, so as to generate a saliency map, finally completes the extraction and preprocessing of dance action contour image. This paper distinguishes the high-frequency and low-frequency parts of dance action through the method of discrete cosine transform, calculates the number of pixels contained in the dance action images to be recognized, and extracts the spatial frequency domain features of contour image of dance action, builds the human posture model. This model realizes the dance action recognition by using the classifier to process the above-extracted feature vector and its label. The experimental result shows that the dance action recognition effect of this research model is good, and its recognition rate is high in different dance action types, and can effectively meet the needs of dance action recognition.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.