Various important 3-D depth cues, such as focus, motion, occlusion, and disparity, can only be estimated reliably at distinct sparse image locations, such as edges and corners. Hence, for 2-D to 3-D video conversion, a stable and smooth sparse-to-dense conversion is required to propagate these sparse estimates to the complete video. To this end, optimization, segmentation, and triangulation-based approaches have been proposed recently. While optimization-based approaches produce accurate dense maps, the resulting energy functions are very hard to minimize within the stringent requirements of real-time video processing. In addition, segmentation and triangulation-based approaches can cause incorrect delineation of object boundaries. Dense maps that are independently estimated from video images suffer from temporal instabilities. To deal with the real-time issue, we propose an innovative low latency, line scanning based sparse-to-dense conversion algorithm with a low computational complexity. To mitigate the stability and smoothness issues, we additionally propose a recursive spatiotemporal postprocessing and an efficient joint bilateral up-sampling method. We illustrate the performance of the resulting sparse-to-dense converter on dense defocus maps. We also show a subjective assessment of 2-D to 3-D conversion results using a paired comparison on a variety of challenging low-depth-of-field test sequences. The results demonstrate that the proposed approach achieves equal 3-D depth and video quality as state-of-the-art sparse-to-dense converters with a significantly reduced computational complexity and memory usage.