We develop the first model-free policy gradient (PG) algorithm for the minimax state estimation of discrete-time linear dynamical systems, where adversarial disturbances could corrupt both dynamics and measurements. Specifically, the proposed algorithm learns a minimax-optimal solution for three fundamental tasks in robust (minimax) estimation, namely terminal state filtering, terminal state prediction, and smoothing, in a unified fashion. We further establish convergence and finite sample complexity guarantees for the proposed PG algorithm. Additionally, we propose a model-free algorithm to evaluate the attenuation (robustness) level of any estimator or smoother, which serves as a model-free solution to identify the maximum size of the disturbance under which the estimator will still be robust. We demonstrate the effectiveness of the proposed algorithms through extensive numerical experiments.