Recently, deep hashing methods have attracted much attention in multimedia retrieval task. Some of them can even perform cross-modal retrieval. However, almost all existing deep cross-modal hashing methods are pairwise optimizing methods, which means that they become time-consuming if they are extended to large scale datasets. In this paper, we propose a novel tri-stage deep cross-modal hashing method – Dual Deep Neural Networks Cross-Modal Hashing, i.e., DDCMH, which employs two deep networks to generate hash codes for different modalities. Specifically, in Stage 1, it leverages a single-modal hashing method to generate the initial binary codes of textual modality of training samples; in Stage 2, these binary codes are treated as supervised information to train an image network, which maps visual modality to a binary representation; in Stage 3, the visual modality codes are reconstructed according to a reconstruction procedure, and used as supervised information to train a text network, which generates the binary codes for textual modality. By doing this, DDCMH can make full use of inter-modal information to obtain high quality binary codes, and avoid the problem of pairwise optimization by optimizing different modalities independently. The proposed method can be treated as a framework which can extend any single-modal hashing method to perform cross-modal search task. DDCMH is tested on several benchmark datasets. The results demonstrate that it outperforms both deep and shallow state-of-the-art hashing methods.