Abstract

One popular method for optimizing systems, referred to as ANN-PSO, uses an artificial neural network (ANN) to approximate the system and an optimization method like particle swarm optimization (PSO) to select inputs. However, with reinforcement learning developments, it is important to compare ANN-PSO to newer algorithms, like Proximal Policy Optimization (PPO). To investigate ANN-PSO’s and PPO’s performance and applicability, we compare their methodologies, apply them on steady-state economic optimization of a chemical process, and compare their results to a conventional first principles modeling with nonlinear programming (FP-NLP). Our results show that ANN-PSO and PPO achieve profits nearly as high as FP-NLP, but PPO achieves slightly higher profits compared to ANN-PSO. We also find PPO has the fastest computational times, 10 and 10,000 times faster than FP-NLP and ANN-PSO, respectively. However, PPO requires more training data than ANN-PSO to converge to an optimal policy. This case study suggests PPO has better performance as it achieves higher profits and faster online computational times. ANN-PSO shows better applicability with its capability to train on historical operational data and higher training efficiency.

Highlights

  • IntroductionMachine learning has shown success in optimizing complex systems such as scheduling electricity prices to manage demand and maximize power grid performance [4,5,6]

  • To investigate if the widely used artificial neural network (ANN)-particle swarm optimization (PSO) can be replaced by newer actor–critic methods, this paper presents a novel comparison between two algorithms—Artificial Neural Network with Particle Swarm Optimization (ANN-PSO) and Policy Optimization (PPO)—by comparing the methods of the algorithms and evaluating these algorithms on a case study of a stochastic steady-state chemical optimization problem

  • An optimization problem is considered where an agent interacts with an environment which is assumed to be fully observable. This problem can be formulated as a Markov Decision Process (MDP) where the environment is described by a set of possible states S ∈ Rn, possible actions A ∈ Rm, a distribution of initial states p(s0 ), a reward distribution function R(st, at ) given state st and action at, a transitional probability p(st+1 |st, at ), and a future reward discount factor γ

Read more

Summary

Introduction

Machine learning has shown success in optimizing complex systems such as scheduling electricity prices to manage demand and maximize power grid performance [4,5,6] This motivates exploration of other machine learning techniques like reinforcement learning (RL) on model-free optimization [7]. RL research has seen many breakthroughs in recent years with new algorithms capable of defeating most humans in difficult games [8,9,10,11]. These algorithms are not designed to play games, but to learn and accomplish general tasks. A real-world example can be seen in OpenAI’s algorithm which learned how to control a robotic hand to solve a Rubick’s cube under disturbances [12]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.