ABSTRACT We investigate how to port the standard interior-point method to new exascale architectures for block-structured nonlinear programs with state equations. Computationally, we decompose the interior-point algorithm into two successive operations: the evaluation of the derivatives and the solution of the associated Karush-Kuhn-Tucker (KKT) linear system. Our method accelerates both operations using two levels of parallelism. First, we distribute the computations on multiple processes using coarse parallelism. Second, each process uses SIMD/GPU accelerators locally to accelerate the operations using fine-grained parallelism. The KKT system is reduced by eliminating the inequalities and the state variables from the corresponding equations. We demonstrate our method's capability on the supercomputer Polaris, a testbed for the future exascale Aurora system. Each node is equipped with four GPUs, a setup amenable to our two-level approach. Our experiments on the stochastic optimal power flow problem show that the reduction method is 50x faster than the sparse linear solver HSL MA57 running in serial on the CPU, and 6x faster than Pardiso running in parallel on CPU on the same number of processes.