In this paper, we study the channel estimation and the optimal training design for relay networks that operate under the <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">decode-and-forward</i> (DF) strategy with the knowledge of the interference covariance. In addition to the total power constraint on all the relays, we introduce individual power constraint for each relay, which reflects the practical scenario where all relays are separated from one another. Considering the individual power constraint for the relay networks is the major difference from that in the traditional point-to-point communication systems where only a total power constraint exists for all colocated antennas. Two types of channel estimation are involved: maximum likelihood (ML) and minimum mean square error (MMSE). For ML channel estimation, the channels are assumed as <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">deterministic</i> and the optimal training results from an efficient multilevel waterfilling type solution that is derived from the majorization theory. For MMSE channel estimation, however, the second-order statistics of the channels are assumed known and the general optimization problem turns out to be nonconvex. We instead consider three special yet reasonable scenarios. The problem in the first scenario is convex and could be efficiently solved by state-of-the-art optimization tools. Closed-form waterfilling type solutions are found in the remaining two scenarios, of which the first one has an interesting physical interpretation as pouring water into caves.