Deep hashing image feature learning methods has attracted attentions of cross-modal semantics understanding researchers due to its low storage costs and efficient query speed. Usually heterogeneous cross-modal data are embedded in semantic space, then they may be converted to their own corresponding binary hash code through the learning hash function. But when mapping heterogeneous data to common Hamming space, some works ignore joint cross correlation that contributes to interactively explore the latent semantic information between different modalities, resulting in sub-optimal feature. To address the above issues, we present a novel deep discriminative feature learning methods for cross-modal semantics understanding named Deep Discriminant Semantic Joint Hashing(DDSJH). In order to maximize joint cross correlation, we use mutual information which contribute to semantics understanding. Features in semantic space are exchanged with the pairwise features to calculate loss between semantic space and Hamming space. Thus, the corresponding information in cross-modal data are collaboratively utilized to realize exploration of underlying mutual joint semantic correlation. Hash codes of similar categories are as close as possible, but it should be considered that hash codes of different categories data should be as discriminative as possible. So we harness linear classifier to learn discriminative hash code. Extensive experiments on two image–text cross-modal datasets show that our proposed approach achieves better accuracy than several state-of-the-art methods.