In recent years, owing to the rapid growth of the number of videos on the Internet, near-duplicate video retrieval (NDVR) by video hashing has attracted huge attention. In the existing methods, the visual features of videos, including single feature and multiple visual feature fusion, are widely used in the NDVR algorithms. However, low-level visual features have some disadvantages in expressing high-level semantics, which may lead to low performance in NDVR. In this paper, we propose a video hashing method for NDVR based on hierarchical feature fusion to address this issue. In the proposed method, low-level handcrafted features from videos were first extracted; then, the intermediate-level deep features and high-level semantic features extracted from the convolutional neural network are obtained. Finally, these semantic features are combined with low-level visual features, where the global structural relationships and complementarity discovered among the hierarchical features are utilized to learn the hash code for NDVR. Extensive experiments are performed on the CC-WEB-VIDEO dataset; the proposed framework proves to have a better retrieval performance compared with the state-of-the-art approaches.