Numerous stereo matching algorithms have been proposed to obtain disparity estimation for a single pair of stereo images. However, simply even applying the best of them to temporal frames independently, i.e., without considering the temporal consistency between consecutive frames, may suffer from the undesirable artifacts. Here, we proposed an adaptive, spatiotemporally consistent, constraints-based systematic method that generates spatiotemporally consistent disparity maps for stereo video image sequences. Firstly, a reliable temporal neighborhood is used to enforce the “self-similarity” assumption and prevent errors caused by false optical flow matching from propagating between consecutive frames. Furthermore, we formulate the adaptive temporal predicted disparity map as prior knowledge of the current frame. It is used as a soft constraint to enhance the temporal consistency of disparities, increase the robustness to luminance variance, and restrict the range of the potential disparities for each pixel. Additionally, to further strengthen smooth variation of disparities, the adaptive temporal segment confidence is incorporated as a soft constraint to reduce ambiguities caused by under- and over-segmentation, and retain the disparity discontinuities that align with 3D object boundaries from geometrically smooth, but strong color gradient regions. Experimental evaluations demonstrate that our method significantly improves the spatiotemporal consistency both quantitatively and qualitatively compared with other state-of-the-art methods on the synthetic DCB and realistic KITTI datasets.