Transit Signal Priority (TSP) is a broadly used traffic signal control strategy designed for reducing transit delays at signalized intersections. Although recent TSP systems began to consider more objectives, TSPs that addressed transit reliability issues commonly focused on improving schedule adherence and were only able to reduce schedule delays by expediting buses. Buses running ahead of the schedule were not considered. This paper proposed a dual-objective two-way TSP algorithm (D2 TSP) using Deep Reinforcement Learning (DRL). D2 TSP concurrently optimizes transit delays and reliability (i.e., headway adherence) by expediting late buses or delaying early buses. Further, the DRL agents were enhanced with a coordination algorithm for an optimized solution balancing opposite directions. This D2 TSP reacts adaptively and efficiently to real-time bus performance using data provided by readily available technology (loop detector) at low communication frequencies. We trained and tested this algorithm in a stochastic microsimulation environment in Aimsun Next that modelled a transit route segment with reliability issues in the City of Toronto. The performance of D2 TSP was compared with four baseline scenarios, one without TSP, one with the current TSP algorithm used in the field in the City of Toronto, one conditional TSP with an arrival prediction model, and one using DRL agents with a First-Come-First-Served logic. D2 TSP demonstrated its advantages in providing an efficient and balanced solution in reducing headway variability and travel time for both directions.