Bernoulli multi-armed bandits are a reinforcement learning model used to optimize the sequences of decisions with binary outcomes. Well-known bandit algorithms, including the optimal policy, assume that before a decision is made the outcomes of previous decisions are known. This assumption is often not satisfied in real-life scenarios. As demonstrated in this article, if decision outcomes are affected by delays, the performance of existing algorithms can be severely affected. We present the first practically applicable method to compute statistically optimal decisions in the presence of outcome delays. Our method has a predictive component abstracted out into a meta-algorithm, predictive algorithm reducing delay impact (PARDI), which significantly reduces the impact of delays on commonly used algorithms. We demonstrate empirically that PARDI-enhanced Whittle index is nearly optimal for a wide range of Bernoulli bandit parameters and delays. In a wide spectrum of experiments, it performed better than any other suboptimal algorithm, e.g., UCB1-tuned and Thompson sampling. PARDI-enhanced Whittle index can be used when computational requirements of the optimal policy are too high.