During the past ten years, Wyner-Ziv video coding (WZVC) has gained a lot of research interests because of its unique characteristics of "simple encoding, complex de coding." However, the performance gap between WZVC and conventional video coding has never been closed to the point promised by the information theory. In this paper, we illustrate the chicken-and-egg dilemma encountered in WZVC: high efficiency WZVC requires good estimation of side information (SI); however, good SI estimation is not possible for the decoder without access to the decoded current frame. To resolve such a dilemma, we present and advocate a framework that explores an important concept of decoder-side progressive-learning. More specifically, a decoder-side multi-resolution motion refinement (MRMR) scheme is proposed, where the decoder is able to learn from the already-decoded lower-resolution data to refine the motion estimation (ME), which in turn greatly improves the SI quality as well as the coding efficiency for the higher resolution data. Theoretical analysis shows that at high rates, decoder-side MRMR outperforms motion extrapolation by as much as 5 dB, while falling behind conventional encoder-side inter-frame ME by only about 1.5 dB. In addition, since decoder-side ME does not suffer from the bit-rate overhead in transmitting the motion information, further performance gain can be achieved for decoder-side MRMR by incorporating fractional-pel motion search, block matching with smaller block sizes, and multiple hypothesis prediction. We also present a practical WZVC implementation with MRMR, which shows comparable coding performance as H.264 at very high bit rates.