The online optimal transmission policy and outage performance are investigated for energy harvesting (EH) two-way relay (TWR) networks by exploiting a real-data-driven stochastic EH model. In this network, two solar-powered source nodes are both equipped with finite-sized batteries, and communicate with each other via a relay node. Two energy supply modes are studied for the relay node: fixed power (FP) or solar-powered as an EH node with a finite-sized battery. The long-term outage probability of the TWR network is optimized by adapting the transmission power of all EH nodes to the causal solar EH condition, the fading channel states and the energy storage of their batteries. Thus, the framework is formulated as a Markov decision process (MDP), and the reward function is defined as the complement of the network's conditional outage probability, whose exact closed-form is derived theoretically. Based on the MDP model, the optimal transmission policy and expected outage probability of the network are computed. In the FP relay mode, a two-dimensional (2D) on-off threshold structure of the optimal policy is revealed, and it shows the 2-D zero-action, non-conservative and power-converged properties of the power actions of the two EH source nodes. This special property indicates the transmission interaction between the two EH source nodes. Further, the outage performance limit of the network is found out to be determined by the deficiency probability of network batteries, and it reflects the impact of the EH capabilities of the two source nodes on the network performance. Moreover, in the EH relay mode, a more complicated three-dimensional on-off threshold structure of the optimal policy and the outage performance limit of the network are also derived, reflecting the impact of the EH relay on the network's power allocation and outage performance, respectively. Finally, computer simulations are demonstrated to verify the theoretical analysis.