Plenoptic cameras have been increasingly applied in three-dimensional (3D) optical imaging and measurements. Micro-lens array is the key component to capture the images at different depths simultaneously to realize the 3D reconstruction of the scenery. However, only a part of the captured information will be used eventually when focusing on specific objects, which implies a waste of time and resources that are incompetent for fast-response tasks like gesture recognition. Furthermore, it is still quite challenging to accurately calibrate the depths under complex circumstances, even sometimes losing a part of the target information due to the low signal-to-noise ratio. In this paper, a new hybrid optical system is developed combining a geometrical waveguide and micro-lens array to project a virtual scale network for quantitative depth calibration and fast-tracking of targets. Such a methodology based on the augmented reality (AR) mechanism helps to rapidly scope the targeted objects/features without reconstructing the full range model, significantly saving the processing time for a low-latency response. By establishing a geometrical model to quantitatively correlate the images in the auxiliary coordinate system and the virtual scale coordinate system, the depth error caused by the plenoptic camera can be calibrated and corrected. The coefficient of determination R2is used to evaluate the depth accuracy of 3D images and acts as the threshold to control the depth correction iterations. The closer the value of R2 is to 1, the more accurate the depth information is. Experiments proved that even under complex backgrounds or insufficient light, in virtue of the virtual coordinate networks, backgrounds could be rapidly filtered, and targets were effectively identified with accurate depth information. The algorithm flow chart for the correction of depth error is given. In such a way, the system can achieve faster and more accurate capturing of 3D objects in a real-time manner than conventional plenoptic cameras.