AI-driven protein pocket detection through integrating deep Q-networks for structural analysis.
Protein pockets, or small cavities on the protein surface, are critical sites for enzymatic catalysis, molecular recognition, and drug binding. Accurately identifying these pockets is crucial for understanding protein function and designing therapeutic interventions. Traditional computational methods such as molecular docking, surface grid mapping, and molecular dynamics simulations are hampered by the use of fixed protein structures, and therefore it is challenging to identify cryptic pockets when they appear under physiological conditions. We propose a deep reinforcement learning (DRL) technique based on deep Q-networks (DQN) to identify precise protein pockets. Our strategy to improve the prediction of functional binding sites incorporates important molecular descriptors such as spatial coordinates, solvent-accessible surface area (SASA), hydrophobicity, and electrostatic charge. We pre-process protein structure data from the protein data bank (PDB) through feature extraction and selection methods, including variance threshold filtering and dimensionality reduction using an autoencoder. The sparse feature representation enables efficient training of a DQN agent, which navigates protein surfaces and iteratively optimizes pocket predictions. By using reinforcement learning concepts, the model adapts its pocket detection strategy according to the learned reward signals, increasing sensitivity and specificity. The method is tested on benchmark datasets and is found to exhibit superior performance in detecting well-defined and cryptic pockets over traditional computational methods. Experimental evidence suggests that our model successfully identifies binding sites in various protein families, with significant implications for drug discovery and protein-ligand interaction studies. Moreover, the model's ability to incorporate geometric and biochemical features allows for a better understanding of pocket functionality. The scalability of our method makes it an important tool for large-scale virtual screening and personalized medicine. By using deep reinforcement learning, this research provides a new and effective framework for protein pocket prediction, opening up opportunities for developing new tools in structural bioinformatics, drug design, and molecular biology research.
37
- 10.1038/s41598-022-23014-1
- Oct 28, 2022
- Scientific Reports
21
- 10.1186/s12859-019-3058-0
- Sep 18, 2019
- BMC Bioinformatics
9
- 10.1089/omi.2023.0197
- Dec 1, 2023
- Omics : a journal of integrative biology
656
- 10.1016/j.csbj.2020.03.025
- Jan 1, 2020
- Computational and Structural Biotechnology Journal
87
- 10.1093/bioinformatics/btaa858
- Dec 7, 2020
- Bioinformatics
47
- 10.1177/11779322211030364
- Jan 1, 2021
- Bioinformatics and Biology Insights
7
- 10.1186/s13321-024-00923-z
- Nov 11, 2024
- Journal of Cheminformatics
33
- 10.1016/j.pbiomolbio.2023.08.002
- Oct 9, 2023
- Progress in Biophysics and Molecular Biology
56
- 10.1038/s42004-020-0261-x
- Feb 11, 2020
- Communications Chemistry
4
- 10.1007/s12033-022-00605-x
- Dec 4, 2022
- Molecular Biotechnology
- Research Article
- 10.36676/urr.v8.i4.1399
- Dec 2, 2021
- Universal Research Reports
Deep Reinforcement Learning (DRL) is a rapidly evolving field that has significantly influenced autonomous systems such as self-driving cars, drones, and robotics. This survey aims to provide a comprehensive overview of the state-of-the-art DRL techniques and their applications in real-world autonomous systems. The paper discusses various architectures such as Deep Q-Networks (DQNs), Policy Gradient methods, and Actor-Critic models, analyzing their strengths and weaknesses in dynamic and complex environments. Moreover, the study focuses on how DRL can address specific challenges in decision-making, navigation, and obstacle avoidance for autonomous vehicles. The integration of DRL with sensor data, such as LIDAR and camera inputs, is explored to understand how these systems can learn more efficiently in real-time environments. Furthermore, the paper examines the scalability of DRL models in large-scale autonomous systems and presents the most recent advancements in this domain. Challenges such as overfitting, reward shaping, and sample inefficiency are discussed, alongside potential future directions like multi-agent systems and cooperative DRL. The review also highlights real-world applications and case studies, illustrating how DRL is implemented in autonomous systems. Finally, the ethical and safety concerns associated with autonomous DRL systems are considered, particularly in the context of self-driving cars and other autonomous technologies that interact with humans.
- Research Article
- 10.11591/ijece.v15i2.pp1924-1932
- Apr 1, 2025
- International Journal of Electrical and Computer Engineering (IJECE)
Deploying mobile application tasks that require a lot of computing and are time-sensitive to distant cloud-based data centers has become a popular method of working around the limitations of mobile devices (MDs). Deep reinforcement learning (DRL) techniques for offloading in mobile edge computing (MEC) environments struggle to adapt to new situations due to low sample efficiency for each new context. To address these issues, a novel computational offloading in mobile edge computing (COOL-MEC) algorithm has been proposed that combines the benefits of attention modules and bi-directional long short-term memory. This algorithm improves server resource utilization by lowering the cost of assimilating processing latency, processing energy consumption, and task throughput of latency-sensitive tasks. The experiment's findings show that, when used as intended, the recommended COOL-MEC algorithm minimizes energy consumption. When compared to the current deep convolutional attention reinforcement learning with adaptive reward policy (DCARL-ARP) and DRL techniques, the energy consumption of the proposed COOL-MEC is decreased by 0.06% and 0.08%, respectively. The average time per channel utilized for the execution of the proposed COOL-MEC also decreased by 0.051% and 0.054% when compared with existing DCARL-ARP and DRL methods respectively.
- Research Article
5
- 10.1016/j.cose.2024.103825
- Mar 28, 2024
- Computers & Security
Cyber-physical systems (CPS) play a vital role in modern society across various sectors, ranging from smart grid to water treatment, and their security has become one of the major concerns. Due to the significantly growing complexity and scale of CPS and cyber-attacks, it is imperative to develop defense and prevention strategies specifically for CPS that are adaptive, scalable, and robust. An important research and application direction in this domain is time series anomaly detection within CPS utilizing advanced machine learning techniques, such as deep learning and reinforcement learning. However, many anomaly detectors fail to balance between detection performance and computational overhead, limiting their applicability in CPS. In this paper, we introduce a novel agent-based dynamic thresholding (ADT) method based on the deep reinforcement learning technique, i.e. deep Q-network (DQN), to model thresholding in anomaly detection as a Markov decision process. By utilizing anomaly scores generated from an autoencoder and other useful information perceived from a simulated environment, ADT performs the optimal dynamic thresholding control, facilitating real-time adaptive anomaly detection for time series. Rigorous evaluations were conducted on realistic datasets from water treatment and industrial control systems, specifically SWaT, WADI, and HAI, comparing against established benchmarks. The experimental results demonstrate ADT's superior detection performance, dynamic thresholding capability, data-efficient learning, and robustness. Notably, ADT, even when trained on minimal labeled data, consistently outperforms benchmarks with F1 scores ranging from 0.995 to 0.999 across all datasets. It is effective even in challenging scenarios where the environmental feedback is noisy, delayed, or partial. Beyond its direct application as an advanced anomaly detector, ADT possesses the versatility to act as a lightweight dynamic thresholding controller, boosting other anomaly detection models. This underscores ADT's considerable promise in sophisticated and dynamic CPS environments.
- Research Article
209
- 10.1016/j.enbuild.2019.109675
- Dec 3, 2019
- Energy and Buildings
Study on deep reinforcement learning techniques for building energy consumption forecasting
- Research Article
39
- 10.1109/tmc.2020.2990399
- Apr 28, 2020
- IEEE Transactions on Mobile Computing
This paper investigates a new class of carrier-sense multiple access (CSMA) protocols that employ deep reinforcement learning (DRL) techniques, referred to as carrier-sense deep-reinforcement learning multiple access (CS-DLMA). The goal of CS-DLMA is to enable efficient and equitable spectrum sharing among a group of co-located heterogeneous wireless networks. Existing CSMA protocols, such as the medium access control (MAC) protocol of WiFi, are designed for a homogeneous network in which all nodes adopt the same protocol. Such protocols suffer from severe performance degradation in a heterogeneous environment where there are nodes adopting other MAC protocols. CS-DLMA aims to circumvent this problem by making use of DRL. In particular, this paper adopts α-fairness as the general objective of CS-DLMA. With α-fairness, CS-DLMA can achieve a range of different objectives (e.g., maximizing sum throughput, achieving proportional fairness, or achieving max-min fairness) when coexisting with other MACs by changing the value of α. A salient feature of CS-DLMA is that it can achieve these objectives without knowing the coexisting MACs through a learning process based on DRL. The underpinning DRL technique in CS-DLMA is deep Q-network (DQN). However, the conventional DQN algorithms are not suitable for CS-DLMA due to their uniform time-step assumption. In CSMA protocols, time steps are non-uniform in that the time duration required for carrier sensing is smaller than the duration of data transmission. This paper introduces a non-uniform time-step formulation of DQN to address this issue. Our simulation results show that CS-DLMA can achieve the general α-fairness objective when coexisting with TDMA, ALOHA, and WiFi protocols by adjusting its own transmission strategy. Interestingly, we also find that CS-DLMA is more Pareto efficient than other CSMA protocols, e.g., p-persistent CSMA, when coexisting with WiFi. Although this paper focuses on the use of our non-uniform time-step DQN formulation in wireless networking, we believe this new DQN formulation can also find use in other domains.
- Book Chapter
84
- 10.1007/978-981-15-4095-0_2
- Jan 1, 2020
In this chapter, we introduce the fundamentals of classical reinforcement learning and provide a general overview of deep reinforcement learning. We first start with the basic definitions and concepts of reinforcement learning, including the agent, environment, action, and state, as well as the reward function. Then, we describe a classical reinforcement learning problem, the bandit problem, to provide the readers with a basic understanding of the underlying mechanism of traditional reinforcement learning. Next, we introduce the Markov process, together with the Markov reward process and the Markov decision process. These notions are the cornerstones in formulating reinforcement learning tasks. The combination of the Markov reward process and value function estimation produces the core results used in most reinforcement learning methods: the Bellman equations. The optimal value functions and optimal policy can be derived through solving the Bellman equations. Three main approaches for solving the Bellman equations are then introduced: dynamic programming, Monte Carlo method, and temporal difference learning. We further introduce deep reinforcement learning for both policy and value function approximation in policy optimization. The contents in policy optimization are introduced in two main categories: value-based optimization and policy-based optimization. In value-based optimization, the gradient-based methods are introduced for leveraging deep neural networks, like Deep Q-Networks. In policy-based optimization, the deterministic policy gradient and stochastic policy gradient are introduced in detail with sufficient mathematical proofs. The combination of value-based and policy-based optimization produces the popular actor-critic structure, which leads to a large number of advanced deep reinforcement learning algorithms. This chapter will lay a foundation for the rest of the book, as well as providing the readers with a general overview of deep reinforcement learning.
- Research Article
199
- 10.1002/mp.12625
- Nov 14, 2017
- Medical Physics
To investigate deep reinforcement learning (DRL) based on historical treatment plans for developing automated radiation adaptation protocols for nonsmall cell lung cancer (NSCLC) patients that aim to maximize tumor local control at reduced rates of radiation pneumonitis grade 2 (RP2). In a retrospective population of 114 NSCLC patients who received radiotherapy, a three-component neural networks framework was developed for deep reinforcement learning (DRL) of dose fractionation adaptation. Large-scale patient characteristics included clinical, genetic, and imaging radiomics features in addition to tumor and lung dosimetric variables. First, a generative adversarial network (GAN) was employed to learn patient population characteristics necessary for DRL training from a relatively limited sample size. Second, a radiotherapy artificial environment (RAE) was reconstructed by a deep neural network (DNN) utilizing both original and synthetic data (by GAN) to estimate the transition probabilities for adaptation of personalized radiotherapy patients' treatment courses. Third, a deep Q-network (DQN) was applied to the RAE for choosing the optimal dose in a response-adapted treatment setting. This multicomponent reinforcement learning approach was benchmarked against real clinical decisions that were applied in an adaptive dose escalation clinical protocol. In which, 34 patients were treated based on avid PET signal in the tumor and constrained by a 17.2% normal tissue complication probability (NTCP) limit for RP2. The uncomplicated cure probability (P+) was used as a baseline reward function in the DRL. Taking our adaptive dose escalation protocol as a blueprint for the proposed DRL (GAN + RAE + DQN) architecture, we obtained an automated dose adaptation estimate for use at ∼2/3 of the way into the radiotherapy treatment course. By letting the DQN component freely control the estimated adaptive dose per fraction (ranging from 1-5 Gy), the DRL automatically favored dose escalation/de-escalation between 1.5 and 3.8 Gy, a range similar to that used in the clinical protocol. The same DQN yielded two patterns of dose escalation for the 34 test patients, but with different reward variants. First, using the baseline P+ reward function, individual adaptive fraction doses of the DQN had similar tendencies to the clinical data with an RMSE = 0.76 Gy; but adaptations suggested by the DQN were generally lower in magnitude (less aggressive). Second, by adjusting the P+ reward function with higher emphasis on mitigating local failure, better matching of doses between the DQN and the clinical protocol was achieved with an RMSE = 0.5 Gy. Moreover, the decisions selected by the DQN seemed to have better concordance with patients eventual outcomes. In comparison, the traditional temporal difference (TD) algorithm for reinforcement learning yielded an RMSE = 3.3 Gy due to numerical instabilities and lack of sufficient learning. We demonstrated that automated dose adaptation by DRL is a feasible and a promising approach for achieving similar results to those chosen by clinicians. The process may require customization of the reward function if individual cases were to be considered. However, development of this framework into a fully credible autonomous system for clinical decision support would require further validation on larger multi-institutional datasets.
- Research Article
3
- 10.1016/j.jksuci.2024.102177
- Aug 31, 2024
- Journal of King Saud University - Computer and Information Sciences
EETS: An energy-efficient task scheduler in cloud computing based on improved DQN algorithm
- Research Article
58
- 10.1109/access.2021.3054909
- Jan 1, 2021
- IEEE Access
Cloud radio access network (CRAN) has been shown as an effective means to boost network performance. Such gain stems from the intelligent management of remote radio heads (RRHs) in terms of on/off operation mode and power consumption. Most conventional resource allocation (RA) methods, however, optimize the network utility without considering the switching overhead of RRHs in adjacent time intervals. When the network environment becomes time-correlated, mathematical optimization is not directly applicable. In this paper, we aim to optimize the energy efficiency (EE) subject to the constraints on per-RRH transmission power and user data rates. To this end, we formulate the EE problem as a Markov decision process (MDP) and subsequently adopt deep reinforcement learning (DRL) technique to reap the cumulative EE rewards. Our starting point is the deep Q network (DQN), which is a combination of deep learning and Q-learning. In each time slot, DQN configures the status of RRHs yielding the largest Q-value (known as state-action value) prior to solving a power minimization problem for active RRHs. To overcome the Q-value overestimation issue of DQN, we propose a Double DQN (DDQN) framework that obtains optimal reward better than DQN by separating the selected action from the target Q-value generator. Simulation results validate that the DDQN-based RA method is more energy-efficient than the DQN-based RA algorithm and a baseline solution.
- Conference Article
2
- 10.1109/icufn55119.2022.9829674
- Jul 5, 2022
Despite great advances in controlling vehicles for autonomous driving and in deep reinforcement learning (DRL) techniques, designing an end-to-end architecture that supports autonomous driving using DRL techniques while facing uncertainties in complex and dynamic environments still remains challenging. By examining the state-of-the-art works in the domain of DRL for autonomous driving and inspired from the work of [1], we have designed an end-to-end autonomous driving system using the Ape-X algorithm [2] in Carla simulation environment [3] and have evaluated the performance by comparing its results to those that are obtained using other DRL techniques.
- Conference Article
- 10.1063/5.0108911
- Jan 1, 2022
This paper reviews the complex task problems which are out of reach for a simple machine. So, there is a need for a solution for such a problem, so the solution is Reinforcement Learning with deep Q-Network. Reinforcement learning techniques are now being researched for their applicability in a wide range of situations. Perhaps because of the rising complexity and unpredictability in the generation and distribution sector of power systems, traditional approaches frequently encounter congestion while attempting to handle decision and control issues that are out of reach for a basic machine. Deep Reinforcement Learning (DRL) is one of these data-driven approaches that is considered true Artificial Intelligence (AI). DRL is a hybrid of Deep Learning (DL) and Reinforcement Learning (RL). Our study examines the fundamental concepts, models, methods, and approaches of DRL. It also presents power system applications such as smart grids, energy management, demand response, the electricity market, operational control, and many others. Furthermore, current advancements in DRL, the coupling of RL with other classical techniques, and the prospects and problems of its applications in the power system are explored.
- Research Article
108
- 10.1109/tnnls.2018.2790981
- Jun 1, 2018
- IEEE Transactions on Neural Networks and Learning Systems
In this paper, a new training paradigm is proposed for deep reinforcement learning using self-paced prioritized curriculum learning with coverage penalty. The proposed deep curriculum reinforcement learning (DCRL) takes the most advantage of experience replay by adaptively selecting appropriate transitions from replay memory based on the complexity of each transition. The criteria of complexity in DCRL consist of self-paced priority as well as coverage penalty. The self-paced priority reflects the relationship between the temporal-difference error and the difficulty of the current curriculum for sample efficiency. The coverage penalty is taken into account for sample diversity. With comparison to deep Q network (DQN) and prioritized experience replay (PER) methods, the DCRL algorithm is evaluated on Atari 2600 games, and the experimental results show that DCRL outperforms DQN and PER on most of these games. More results further show that the proposed curriculum training paradigm of DCRL is also applicable and effective for other memory-based deep reinforcement learning approaches, such as double DQN and dueling network. All the experimental results demonstrate that DCRL can achieve improved training efficiency and robustness for deep reinforcement learning.
- Research Article
5
- 10.1155/2022/9921885
- Jan 1, 2022
- Wireless Power Transfer
This paper aims to solve the optimization problems in far-field wireless power transfer systems using deep reinforcement learning techniques. The Radio-Frequency (RF) wireless transmitter is mounted on a mobile robot, which patrols near the harvested energy-enabled Internet of Things (IoT) devices. The wireless transmitter intends to continuously cruise on the designated path in order to fairly charge all the stationary IoT devices in the shortest time. The Deep Q-Network (DQN) algorithm is applied to determine the optimal path for the robot to cruise on. When the number of IoT devices increases, the traditional DQN cannot converge to a closed-loop path or achieve the maximum reward. In order to solve these problems, an area division Deep Q-Network (AD-DQN) is invented. The algorithm can intelligently divide the complete charging field into several areas. In each area, the DQN algorithm is utilized to calculate the optimal path. After that, the segmented paths are combined to create a closed-loop path for the robot to cruise on, which can enable the robot to continuously charge all the IoT devices in the shortest time. The numerical results prove the superiority of the AD-DQN in optimizing the proposed wireless power transfer system.
- Research Article
- 10.62411/jcta.12560
- Apr 23, 2025
- Journal of Computing Theories and Applications
Cybersecurity is continuously challenged by increasingly sophisticated and dynamic cyber-attacks, necessitating advanced adaptive defense mechanisms. Deep Reinforcement Learning (DRL) has emerged as a promising approach, offering significant advantages over traditional intrusion detection methods through real-time adaptability and self-learning capabilities. This paper presents an advanced adaptive cybersecurity framework utilizing five prominent DRL algorithms: Deep Q-Network (DQN), Proximal Policy Optimization (PPO), Twin Delayed DDPG (TD3), Soft Actor-Critic (SAC), and Asynchronous Advantage Actor-Critic (A3C). The effectiveness of these algorithms is evaluated within complex, realistic simulation environments using live-streaming data, emphasizing key metrics such as accuracy (AUC-ROC), response latency, and network throughput. Experimental results demonstrate that the SAC algorithm consistently achieves superior detection accuracy (95% AUC-ROC) and minimal disruption to network performance compared to other approaches. Additionally, A3C provides the fastest response times suitable for real-time defense scenarios. This comprehensive comparative analysis addresses critical research gaps by integrating both traditional and novel DRL techniques and validates their potential to substantially improve cybersecurity defense strategies in realistic operational settings.
- Research Article
52
- 10.1109/access.2019.2958873
- Jan 1, 2019
- IEEE Access
The prediction of hidden or missing links in a criminal network, which represent possible interactions between individuals, is a significant problem. The criminal network prediction models commonly rely on Social Network Analysis (SNA) metrics. These models leverage on machine learning (ML) techniques to enhance the predictive accuracy of the models and processing speed. The problem with the use of classical ML techniques such as support vector machine (SVM), is the dependency on the availability of large dataset for training purpose. However, recent ground breaking advances in the research of deep reinforcement learning (DRL) techniques have developed methods of training ML models through self-generated dataset. In view of this, DRL could be applied to other domains with relatively smaller dataset such as criminal networks. Prior to this research, few, if any, previous works have explored the prediction of links within criminal networks that could appear and/or disappear over time by leveraging on DRL technique. Therefore, in this paper, the primary objective is to construct a time-based link prediction model (TDRL) by leveraging on DRL technique to train using a relatively small real-world criminal dataset that evolves over time. The experimental results indicate that the predictive accuracy of the DRL model trained on the temporal dataset is significantly better than other ML models that are trained only with the dataset at specific snapshot in time.
- New
- Research Article
- 10.1007/s10822-025-00703-3
- Nov 8, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00699-w
- Nov 4, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00702-4
- Nov 4, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00692-3
- Nov 4, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00687-0
- Nov 4, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00695-0
- Nov 4, 2025
- Journal of computer-aided molecular design
- New
- Research Article
- 10.1007/s10822-025-00697-y
- Nov 4, 2025
- Journal of computer-aided molecular design
- Research Article
- 10.1007/s10822-025-00681-6
- Oct 28, 2025
- Journal of computer-aided molecular design
- Research Article
- 10.1007/s10822-025-00691-4
- Oct 28, 2025
- Journal of computer-aided molecular design
- Research Article
- 10.1007/s10822-025-00685-2
- Oct 28, 2025
- Journal of computer-aided molecular design
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.