We investigate a multi-stage newsvendor problem with advance purchase discount (APD) in this paper. At the beginning of a stage, the decision maker (DM) makes the advance ordering decision for all the periods in this stage; at the start of every period within the stage, the DM makes the regular ordering decision. In this problem, the only available information we can observe is the past demands. To solve this problem, we extend the weak aggregating algorithm (WAA) with one decision variable, an online learning approach based on the theory of prediction and learning with expert advice, to a two-dimensional problem that involves advance ordering decisions in stages and regular ordering decisions nested in each stage. The difficulty of the problem lies in transferring learned knowledge of demand information from stage to stage. We design a cross-stage knowledge transfer scheme and obtain online ordering solutions for both advance-order and regular-order. We show that our solutions converge to the optimal solutions asymptotically. In addition, we derive theoretical guarantees for total gains in one stage and cumulative gains for all stages in the planning horizon. Through numerical studies, we find that our solutions are competitive to those offered by the best experts in hindsight. Finally, we do the sensitivity analysis to illustrate the effectiveness of our algorithm under different parameter values.