Depth estimation from stereo images is essential to many applications such as robotics and autonomous vehicles, most of which ask for the real-time response, high energy and storage efficiency. Recent work has shown deep neural networks (DNN) perform extremely well for stereo estimation. However, these state-of-the-art DNN based algorithms are challenging to be deployed into real-world applications due to the high computational complexities of DNNs. Most of them are too slow for real-time inference and require several seconds of GPU computation to process image frames. In this article, we address the problem of fast stereo estimation and propose an efficient and light-weighted stereo matching system, called StereoBit, to produce a disparity map in a real-time manner while achieving close to state-of-the-art accuracy. To achieve this goal, we propose a binary neural network to generate weighted Hamming distance for an efficient similarity join in stereo estimation. In addition, we propose a novel approximation approach to derive StereoBit network directly from the well-trained network with the cosine similarity. Our approximation strategies enable a significant speedup while maintaining almost the same accuracy compared to the network with the cosine similarity. Furthermore, we present an optimization framework for fully exploiting the computing power of StereoBit. The framework provides a significant speedup of stereo estimation routines, and at the same time, reduces the memory usage for storing parameters. The effectiveness of StereoBit is evaluated by comprehensive experiments. StereoBit can achieve 60 frames per second on an NVIDIA TITAN Xp GPU on KITTI 2012 benchmark while achieving 3-pixel non-occluded stereo error 3.56 percent.