Abstract: The challenge of achieving low-latency data services has become increasingly crucial for data center applications. In modern distributed storage systems, efficient data placement plays a vital role in minimizing data movement delays, thereby significantly reducing service latency. While previous solutions for data placement have often relied on pre-assumed data request distributions or traced analysis, the dynamic nature of network conditions and evolving user request patterns poses a complex online decision-making challenge. Traditional static model-based approaches are limited in their ability to cope with such dynamic system dynamics. To address these challenges and consider both data movement and analytical latency, we introduce a novel framework named DataBot+. This framework leverages reinforcement learning techniques to autonomously learn optimal data placement policies. DataBot+ employs neural networks that are trained using a modified version of Q-learning. The input to these networks comprises real-time data flow measurements, while the output is a value function that estimates the imminent future latency. To ensure timely decision-making, DataBot+ is designed with a two-fold asynchronous architecture, consisting of production and training components. This separation guarantees that the training process does not introduce additional overhead to the handling of data flows. Our approach is validated through extensive evaluations using real-world traces, which convincingly demonstrate the efficacy of our proposed design.