In this paper, online policy iteration reinforcement learning (RL) algorithm is proposed for motion control of four wheeled omni-directional robots. The algorithm solves the linear quadratic tracking (LQT) problem in an online manner using real-time measurement data of the robot. This property enables the tracking controller to compensate the alterations of dynamics of the robot's model and environment. The online policy iteration based tracking method is employed as low level controller. On the other side, a proportional derivative (PD) scheme is performed as supervisory planning system (high level controller). In this study, the followed paths of online and offline policy iteration algorithms are compared in a rectangular trajectory in the presence of slippage drawback and motor heat. Simulation and implementation results of the methods demonstrate the effectiveness of the online algorithm compared to offline one in reducing the command trajectory tracking error and robot's path deviations. Besides, the proposed online controller shows a considerable ability in learning appropriate control policy on different types of surfaces. The novelty of this paper is proposition of a simple-structure learning based adaptive optimal scheme that tracks the desired path, optimizes the energy consumption, and solves the uncertainty problem in omni-directional wheeled robots.