Breaking the Loop: Adversarial Attacks on Cognitive-AI Feedback via Neural Signal Manipulation
INTRODUCTION: Brain-Computer Interfaces (BCIs) embedded with Artificial Intelligence (AI) have created powerful closed-loop cognitive systems in the fields of neurorehabilitation, robotics, and assistive technologies. However, these tightly bound systems of human-AI integration expose the system to new security vulnerabilities and adversarial distortions of neural signals.OBJECTIVES: The paper seeks to formally develop and assess neuro-adversarial attacks, a new class of attack vector that targets AI cognitive feedback systems through attacks on electroencephalographic (EEG) signals. The goal of the research was to simulate such attacks, measure the effects, and propose countermeasures. METHODS: Adversarial machine learning (AML) techniques, including Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), were applied to open EEG datasets using Long Short Term Memory (LSTM), Convolutional Neural Networks (CNN), and Transformer-based models. Closed-loop simulations of BCI-AI systems, including real-time feedback, were conducted, and both the attack vectors and the attacks countermeasure approaches (e.g., VAEs, wavelet denoising, adversarial detectors) were tested.RESULTS: Neuro-adversarial perturbations yielded up to 30% reduction in classification accuracy and over 35% user intent misalignment. Transformer-based models performed relatively better, but overall performance degradation was significant. Defense strategies such as variational autoencoders and real-time adversarial detectors returned classification accuracy to over 80% and reduced successful attacks to below 10%.CONCLUSION: The threat model presented in this paper is a significant addition to the world of neuroscience and AI security. Neuro-adversarial attacks represent a real risk to cognitive-AI systems by misaligning human intent and action with machine response. Mobile layer signal sanitation and detection.
- Research Article
- 10.4108/eetss.v9.9502
- Sep 30, 2025
- ICST Transactions on Security and Safety
INTRODUCTION: Brain-Computer Interfaces (BCIs) embedded with Artificial Intelligence (AI) have created powerful closed-loop cognitive systems in the fields of neurorehabilitation, robotics, and assistive technologies. However, these tightly bound systems of human-AI integration expose the system to new security vulnerabilities and adversarial distortions of neural signals.OBJECTIVES: The paper seeks to formally develop and assess neuro-adversarial attacks, a new class of attack vector that targets AI cognitive feedback systems through attacks on electroencephalographic (EEG) signals. The goal of the research was to simulate such attacks, measure the effects, and propose countermeasures.METHODS: Adversarial machine learning (AML) techniques, including Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), were applied to open EEG datasets using Long Short Term Memory (LSTM), Convolutional Neural Networks (CNN), and Transformer-based models. Closed-loop simulations of BCI-AI systems, including real-time feedback, were conducted, and both the attack vectors and the attacks countermeasure approaches (e.g., VAEs, wavelet denoising, adversarial detectors) were tested.RESULTS: Neuro-adversarial perturbations yielded up to 30% reduction in classification accuracy and over 35% user intent misalignment. Transformer-based models performed relatively better, but overall performance degradation was significant. Defense strategies such as variational autoencoders and real-time adversarial detectors returned classification accuracy to over 80% and reduced successful attacks to below 10%.CONCLUSION: The threat model presented in this paper is a significant addition to the world of neuroscience and AI security. Neuro-adversarial attacks represent a real risk to cognitive-AI systems by misaligning human intent and action with machine response. Mobile layer signal sanitation and detection.
- Research Article
- 10.1158/1557-3265.adi21-po-078
- Mar 1, 2021
- Clinical Cancer Research
Purpose: Deep learning (DL) models have shown the ability to automate the classification of diagnostic images used for cancer detection. Unfortunately, recent evidence has suggested DL models are also vulnerable to adversarial image attacks by manipulating image pixels to force models to make incorrect predictions with high confidence. The existence of adversarial images, which are imperceptible from unmodified images to the human eye, poses a roadblock to the safe implementation of DL models in clinical settings. The extent to which diagnostic imaging is vulnerable to adversarial image attacks remains underexplored. We investigated the effectiveness of adversarial imaging attacks on DL models for three common imaging tasks within oncology. Additionally, we explored whether adversarial image attack vulnerability could be used as a metric to improve deep learning model performance. Methods: We employed adversarial imaging attacks on DL models for three common imaging tasks within oncology: 1) Classifying malignant lung nodules on CT imaging, 2) Classifying brain metastases on MRI imaging, 3) Classifying malignant breast lesions on mammograms. To assess relative vulnerability to adversarial image attacks, we also employed two DL models on non-medical images: 1) CIFAR10, 2) MNIST. We considered three first-order adversarial attacks: Fast Gradient Sign Method, Projected Gradient Descent, and Basic Iterative Method. Vulnerability to adversarial image attacks was assessed by comparing model accuracy at fixed levels of image perturbations. Model performance was also measured after removing images which were most susceptible to adversarial imaging attacks. Results: We observed that all three diagnostic imaging types were susceptible to adversarial imaging attacks. Overall diagnostic images were more vulnerable to adversarial attacks compared to non-medical images. Mammograms [29.6% accuracy] appeared to be the most vulnerable to adversarial imaging attacks followed by lung CTs [30.6% accuracy] and brain MRIs [30.8% accuracy]. Finally, we determined that removing images most vulnerable to adversarial manipulation leads to improved deep learning model performance [Mammogram: 73.6 % accuracy, CT: 83.0% accuracy, MRI 84.2% accuracy]. Conclusion: Our study demonstrates that diagnostic imaging modalities in cancer are likely more vulnerable to adversarial attacks than non-medical images. Susceptibility to adversarial imaging attacks varies across different diagnostic imaging modalities. Adversarial susceptibility for an individual image can be used as a valuable metric to improve DL model performance on diagnostic images. Citation Format: Marina Joel, Sachin Umrao, Enoch Chang, Rachel Choi, Daniel Yang, Aidan Gilson, Roy Herbst, Harlan Krumholz, Sanjay Aneja. Exploring adversarial image attacks on deep learning models in oncology [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-078.
- Research Article
1
- 10.54939/1859-1043.j.mst.csce5.2021.14-23
- Dec 15, 2021
- Journal of Military Science and Technology
In recent years with the explosion of research in artificial intelligence, deep learning models based on convolutional neural networks (CNNs) are one of the promising architectures for practical applications thanks to their reasonably good achievable accuracy. However, CNNs characterized by convolutional layers often have a large number of parameters and computational workload, leading to large energy consumption for training and network inference. The binarized neural network (BNN) model has been recently proposed to overcome that drawback. The BNNs use binary representation for the inputs and weights, which inherently reduces memory requirements and simplifies computations while still maintaining acceptable accuracy. BNN thereby is very suited for the practical realization of Edge-AI application on resource- and energy-constrained devices such as embedded or mobile devices. As CNN and BNN both compose linear transformations layers, they can be fooled by adversarial attack patterns. This topic has been actively studied recently but most of them are for CNN. In this work, we examine the impact of the adversarial attack on BNNs and propose a solution to improve the accuracy of BNN against this type of attack. Specifically, we use an Enhanced Fast Adversarial Training (EFAT) method to train the network that helps the BNN be more robust against major adversarial attack models with a very short training time. Experimental results with Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attack models on our trained BNN network with MNIST dataset increased accuracy from 31.34% and 0.18% to 96.96% and 85.08%, respectively.
- Research Article
- 10.1038/s41598-025-34024-0
- Jan 30, 2026
- Scientific Reports
Brain–Computer Interfaces (BCIs) based on electroencephalography (EEG) are widely used in motor rehabilitation, assistive communication, and neurofeedback due to their non-invasive nature and ability to decode movement-related neural activity. Recent advances in deep learning, particularly convolutional neural networks, have improved the accuracy of motor imagery (MI) and motor execution (ME) classification. However, EEG-based BCIs remain vulnerable to adversarial attacks, in which small, imperceptible perturbations can alter classifier predictions, posing risks in safety–critical applications such as rehabilitation therapy and assistive device control. To address this issue, this study proposes a three-level Hierarchical Convolutional Neural Network (HCNN) designed to improve both classification performance and adversarial robustness. The framework decodes motor intention through a structured hierarchy: Level 1 distinguishes MI from ME, Level 2 differentiates unilateral and bilateral motor tasks, and Level 3 performs fine-grained movement classification. The model is evaluated on the publicly available BCI Competition IV-2a dataset, which contains multi-class MI EEG recordings from nine healthy subjects. Robustness is assessed under gradient-based adversarial attacks, including Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and DeepFool, across varying perturbation strengths, with adversarial training incorporated during learning. Experimental results show that the proposed HCNN achieves a clean-data accuracy of 91.2% and exhibits reduced performance degradation under adversarial attacks compared with conventional CNN baselines. These results indicate that hierarchical architectures offer a viable approach for improving the reliability of EEG-based BCIs. All experiments were conducted exclusively on the BCI Competition IV-2a dataset using EEG data from healthy subjects.
- Research Article
38
- 10.1016/j.comcom.2023.09.030
- Oct 11, 2023
- Computer Communications
Untargeted white-box adversarial attack with heuristic defence methods in real-time deep learning based network intrusion detection system
- Research Article
- 10.1016/j.asr.2024.11.054
- Nov 26, 2024
- Advances in Space Research
Research on developing deep learning techniques for autonomous spacecraft relative navigation challenges is continuously growing in recent years. Adopting those techniques offers enhanced performance. However, such approaches also introduce heightened apprehensions regarding the trustability and security of such deep learning methods through their susceptibility to adversarial attacks. In this work, we propose a novel approach for adversarial attack detection for deep neural network-based relative pose estimation schemes based on the explainability concept. We develop for an orbital rendezvous scenario an innovative relative pose estimation technique adopting our proposed Convolutional Neural Network (CNN), which takes an image from the chaser’s onboard camera and outputs accurately the target’s relative position and rotation. We perturb seamlessly the input images using adversarial attacks that are generated by the Fast Gradient Sign Method (FGSM). The adversarial attack detector is then built based on a Long Short Term Memory (LSTM) network which takes the explainability measure namely SHapley Value from the CNN-based pose estimator and flags the detection of adversarial attacks when acting. Simulation results show that the proposed adversarial attack detector achieves a detection accuracy of 99.21%. Both the deep relative pose estimator and adversarial attack detector are then tested on real data captured from our laboratory-designed setup. The experimental results from our laboratory-designed setup demonstrate that the proposed adversarial attack detector achieves an average detection accuracy of 96.29%.
- Research Article
2
- 10.1038/s41598-025-00890-x
- May 14, 2025
- Scientific Reports
Deep learning, particularly convolutional neural networks (CNNs), has proven valuable for brain tumor classification, aiding diagnostic and therapeutic decisions in medical imaging. Despite their accuracy, these models are vulnerable to adversarial attacks, compromising their reliability in clinical settings. In this research, we utilized a VGG16-based CNN model to classify brain tumors, achieving 96% accuracy on clean magnetic resonance imaging (MRI) data. To assess robustness, we exposed the model to Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks, which reduced accuracy to 32% and 13%, respectively. We then applied a multi-layered defense strategy, including adversarial training with FGSM and PGD examples and feature squeezing techniques such as bit-depth reduction and Gaussian blurring. This approach improved model resilience, achieving 54% accuracy on FGSM and 47% on PGD adversarial examples. Our results highlight the importance of proactive defense strategies for maintaining the reliability of AI in medical imaging under adversarial conditions.
- Research Article
- 10.63345/sjaibt.v2.i4.101
- Oct 2, 2025
- Scientific Journal of Artificial Intelligence and Blockchain Technologies
Adversarial attacks have emerged as one of the most critical vulnerabilities in modern computer vision systems powered by deep learning. Despite their remarkable accuracy and generalization capabilities, convolutional neural networks (CNNs), vision transformers (ViTs), and other deep models remain highly susceptible to imperceptible perturbations crafted by adversaries. These perturbations can mislead models into producing incorrect outputs with high confidence, leading to severe consequences in domains such as autonomous driving, biometric authentication, medical imaging, and surveillance. This paper provides an extensive examination of adversarial attacks in computer vision, categorizing them into white-box, black-box, targeted, and untargeted variants. We explore well-known attack techniques such as the Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini & Wagner (C&W), and transferability-based black-box strategies. Furthermore, we review state-of-the-art defense mechanisms, including adversarial training, input preprocessing, gradient masking, certified defenses, and robust optimization. A statistical analysis is provided to evaluate the performance degradation of vision models under adversarial conditions and the improvement achieved through defense strategies. Our methodology integrates systematic literature review, empirical evaluation, and comparative simulation on benchmark datasets such as MNIST, CIFAR-10, and ImageNet. Results highlight that adversarial training remains the most effective defense but comes at the cost of computational overhead and reduced clean accuracy. The paper concludes by identifying gaps in current defense research and outlining future directions, including adaptive hybrid defenses, explainable adversarial robustness, and biologically inspired vision architectures. The study contributes a comprehensive understanding of adversarial machine learning in computer vision and provides a roadmap for building more secure and trustworthy AI systems.
- Research Article
- 10.54254/2755-2721/109/20241413
- Nov 26, 2024
- Applied and Computational Engineering
Abstract. With the widespread use of deep learning models in various applications. People are gradually realizing the vulnerability of these models to adversarial attacks. Adversarial training is an effective strategy to defend against adversarial attacks. Based on the advantages and disadvantages of the current mainstream Fast Gradient Sign Method (FGSM) adversarial training and Projected Gradient Descent (PGD) adversarial training, this paper proposes a hybrid adversarial training that integrates FGSM and PGD methods and uses the ResNet-18 model and SVHN dataset for testing. Experimental results show that hybrid adversarial training can effectively reduce training time. Its accuracy on the original data set is higher than that of PGD adversarial training, which is improved by about 2%. The performance when facing FGSM attacks is almost the same as that of single FGSM adversarial training. The performance when facing PGD attacks decreases more significantly, which is about 2% to 3% lower than that of PGD adversarial training. This study not only helps to understand the robustness of hybrid adversarial training to models facing adversarial attacks but also helps in studying new adversarial training strategies.
- Research Article
21
- 10.1016/j.jid.2020.07.034
- Sep 12, 2020
- Journal of Investigative Dermatology
Clinically Relevant Vulnerabilities of Deep Machine Learning Systems for Skin Cancer Diagnosis
- Book Chapter
11
- 10.5772/intechopen.112442
- Sep 27, 2023
This chapter introduces the concept of adversarial attacks on image classification models built on convolutional neural networks (CNN). CNNs are very popular deep-learning models which are used in image classification tasks. However, very powerful and pre-trained CNN models working very accurately on image datasets for image classification tasks may perform disastrously when the networks are under adversarial attacks. In this work, two very well-known adversarial attacks are discussed and their impact on the performance of image classifiers is analyzed. These two adversarial attacks are the fast gradient sign method (FGSM) and adversarial patch attack. These attacks are launched on three powerful pre-trained image classifier architectures, ResNet-34, GoogleNet, and DenseNet-161. The classification accuracy of the models in the absence and presence of the two attacks are computed on images from the publicly accessible ImageNet dataset. The results are analyzed to evaluate the impact of the attacks on the image classification task.
- Research Article
3
- 10.1371/journal.pone.0307363
- Oct 21, 2024
- PloS one
Convolutional Neural Network (CNN)-based models are prone to adversarial attacks, which present a significant hurdle to their reliability and robustness. The vulnerability of CNN-based models may be exploited by attackers to launch cyber-attacks. An attacker typically adds small, carefully crafted perturbations to original medical images. When a CNN-based model receives the perturbed medical image as input, it misclassifies the image, even though the added perturbation is often imperceptible to the human eye. The emergence of such attacks has raised security concerns regarding the implementation of deep learning-based medical image classification systems within clinical environments. To address this issue, a reliable defense mechanism is required to detect adversarial attacks on medical images. This study will focus on the robust detection of pneumonia in chest X-ray images through CNN-based models. Various adversarial attacks and defense strategies will be evaluated and analyzed in the context of CNN-based pneumonia detection. From earlier studies, it has been observed that a single defense mechanism is usually not effective against more than one type of adversarial attack. Therefore, this study will propose a defense mechanism that is effective against multiple attack types. A reliable defense framework for pneumonia detection models will ensure secure clinical deployment, facilitating radiologists and doctors in their diagnosis and treatment planning. It can also save time and money by automating routine tasks. The proposed defense mechanism includes a convolutional autoencoder to denoise perturbed Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) adversarial images, two state- of-the-art attacks carried out at five magnitudes, i.e., ε (epsilon) values. Two pre-trained models, VGG19 and VGG16, and our hybrid model of MobileNetV2 and DenseNet169, called Stack Model, have been used to compare their results. This study shows that the proposed defense mechanism outperforms state-of-the-art studies. The PGD attack using the VGG16 model shows a better attack success rate by reducing overall accuracy by up to 67%. The autoencoder improves accuracy by up to 16% against PGD attacks in both the VGG16 and VGG19 models.
- Book Chapter
1
- 10.1007/978-981-16-0708-0_3
- Jan 1, 2021
In this paper, the primary focus is of Slot Tagging of Gujarat Dialogue, which enables the Gujarati language communication between human and machine, allowing machines to perform given task and provide desired output. The accuracy of tagging entirely depends on bifurcation of slots and word embedding. It is also very challenging for a researcher to do proper slot tagging as dialogue and speech differs from human to human, which makes the slot tagging methodology more complex. Various deep learning models are available for slot tagging for the researchers, however, in the instant paper it mainly focuses on Long Short-Term Memory (LSTM), Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM) and Long Short-Term Memory – Conditional Random Field (LSTM-CRF), Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Network - Bidirectional Long Short-Term Memory (CNN-BiLSTM) and Bidirectional Long Short-Term Memory – Conditional Random Field (BiLSTM-CRF). While comparing the above models with each other, it is observed that BiLSTM models performs better than LSTM models by a variation ~2% of its F1-measure, as it contains an additional layer which formulates the word string to traverse from backward to forward. Within BiLSTM models, BiLSTM-CRF has outperformed other two Bi-LSTM models. Its F1-measure is better than CNN-BiLSTM by 1.2% and BiLSTM by 2.4%.KeywordsSpoken Language Understanding (SLU)Long Short-Term Memory (LSTM)Slot taggingBidirectional Long Short-Term Memory (BiLSTM)Convolutional Neural Network - Bidirectional Long Short-Term Memory (CNN-BiLSTM)Bidirectional Long Short-Term Memory (BiLSTM-CRF)
- Research Article
1
- 10.1007/s11042-025-21008-5
- Jul 30, 2025
- Multimedia Tools and Applications
Network intrusion detection systems (NIDS) act as a premier defense to protect computer networks from cybersecurity threats. In Adversarial Machine Learning (AML), adversaries aim to deceive Machine Learning (ML) and Deep Learning (DL) models into producing false predictions with deliberately prepared adversarial samples. These intentionally generated adversarial samples have become a significant vulnerability of ML and DL-based systems, posing major challenges for their adoption in real-world, critical applications such as NIDS. In this study, we aim to present a novel hybrid defense model that enhances the performance of DL-based NIDS against adversarial attacks. The MinMaxScaler is used for data normalization. We employed Independent Component Analysis (ICA) for feature extraction and Recursive Feature Elimination (RFE) for feature selection to reduce complexity and overfitting. The proposed model comprises two defense strategies: Projected Gradient Descent (PGD) with a Pigeon-Inspired Optimization Algorithm (PIOA) during the training phase, aiming to enhance the model's ability to distinguish between adversarial examples. Spatial Smoothing (SS) is employed during the testing phase to decrease the potential impact of adversarial noise and sensitivity to minor feature changes. We have implemented three adversarial attack generation methods: Jacobian Saliency Map Attacks (JSMA), Fast Gradient Sign Method (FGSM), and Carlini and Wagner (C&W), and evaluated them in five distinct scenarios. The proposed model demonstrates an accuracy of 99.65%, a recall of 99.87%, an ASR of 1.29%, and a specificity of 99.05%. We further presented the computational efficiency and a hyperparameter sensitivity analysis to validate and assess the model's real-time processing feasibility. The scope of the presented study extends beyond computer security.
- Research Article
2
- 10.3174/ajnr.a8650
- Jan 10, 2025
- AJNR. American journal of neuroradiology
Robustness against input data perturbations is essential for deploying deep learning models in clinical practice. Adversarial attacks involve subtle, voxel-level manipulations of scans to increase deep learning models' prediction errors. Testing deep learning model performance on examples of adversarial images provides a measure of robustness, and including adversarial images in the training set can improve the model's robustness. In this study, we examined adversarial training and input modifications to improve the robustness of deep learning models in predicting hematoma expansion (HE) from admission head CTs of patients with acute intracerebral hemorrhage (ICH). We used a multicenter cohort of n = 890 patients for cross-validation/training, and a cohort of n = 684 consecutive patients with ICH from 2 stroke centers for independent validation. Fast gradient sign method (FGSM) and projected gradient descent (PGD) adversarial attacks were applied for training and testing. We developed and tested 4 different models to predict ≥3 mL, ≥6 mL, ≥9 mL, and ≥12 mL HE in an independent validation cohort applying receiver operating characteristics area under the curve (AUC). We examined varying mixtures of adversarial and nonperturbed (clean) scans for training as well as including additional input from the hyperparameter-free Otsu multithreshold segmentation for model. When deep learning models trained solely on clean scans were tested with PGD and FGSM adversarial images, the average HE prediction AUC decreased from 0.8 to 0.67 and 0.71, respectively. Overall, the best performing strategy to improve model robustness was training with 5:3 mix of clean and PGD adversarial scans and addition of Otsu multithreshold segmentation to model input, increasing the average AUC to 0.77 against both PGD and FGSM adversarial attacks. Adversarial training with FGSM improved robustness against similar type attack but offered limited cross-attack robustness against PGD-type images. Adversarial training and inclusion of threshold-based segmentation as an additional input can improve deep learning model robustness in prediction of HE from admission head CTs in acute ICH.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.