Mining discriminative graph topological information plays an important role in promoting graph representation ability. However, it suffers from two main issues: (1) the difficulty/complexity of computing global inter-class/intra-class scatters, commonly related to mean and covariance of graph samples, for discriminant learning; (2) the huge complexity and variety of graph topological structure that is rather challenging to robustly characterize. In this paper, we propose the Wasserstein Discriminant Dictionary Learning (WDDL) framework to achieve discriminant learning on graphs with robust graph topology modeling, and hence facilitate graph-based pattern analysis tasks. Considering the difficulty of calculating global inter-class/intra-class scatters, a reference set of graphs (aka graph dictionary) is first constructed by generating representative graph samples (aka graph keys) with expressive topological structure. Then, a Wasserstein Graph Representation (WGR) process is proposed to project input graphs into a succinct dictionary space through the graph dictionary lookup. To further achieve discriminant graph learning, a Wasserstein discriminant loss (WD-loss) is defined on the graph dictionary, in which the graph keys are optimizable, to make the intra-class keys more compact and inter-class keys more dispersed. Hence, the calculation of global Wasserstein metric (W-metric) centers can be bypassed. For sophisticated topology mining in the WGR process, a joint-Wasserstein graph embedding module is constructed to model both between-node and between-edge relationships across inputs and graph keys by encapsulating both the Wasserstein metric (between cross-graph nodes) and proposed novel Kron-Gromov-Wasserstein (KGW) metric (between cross-graph adjacencies). Specifically, the KGW-metric comprehensively characterizes the cross-graph connection patterns with the Kronecker operation, then adaptively captures those salient patterns through connection pooling. To evaluate the proposed framework, we study two graph-based pattern analysis problems, i.e. graph classification and cross-modal retrieval, with the graph dictionary flexibly adjusted to cater to these two tasks. Extensive experiments are conducted to comprehensively compare with existing advanced methods, as well as dissect the critical component of our proposed architecture. The experimental results validate the effectiveness of the WDDL framework.