A relay channel consists of an input <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x_{l}</tex> , a relay output <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y_{1}</tex> , a channel output <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</tex> , and a relay sender <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x_{2}</tex> (whose transmission is allowed to depend on the past symbols <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y_{1}</tex> . The dependence of the received symbols upon the inputs is given by <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p(y,y_{1}|x_{1},x_{2})</tex> . The channel is assumed to be memoryless. In this paper the following capacity theorems are proved. 1)If <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</tex> is a degraded form of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y_{1}</tex> , then <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C \: = \: \max \!_{p(x_{1},x_{2})} \min \,{I(X_{1},X_{2};Y), I(X_{1}; Y_{1}|X_{2})}</tex> . 2)If <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y_{1}</tex> is a degraded form of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">y</tex> , then <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C \: = \: \max \!_{p(x_{1})} \max_{x_{2}} I(X_{1};Y|x_{2})</tex> . 3)If <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p(y,y_{1}|x_{1},x_{2})</tex> is an arbitrary relay channel with feedback from <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(y,y_{1})</tex> to both <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x_{1} \and x_{2}</tex> , then <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C\: = \: \max_{p(x_{1},x_{2})} \min \,{I(X_{1},X_{2};Y),I \,(X_{1};Y,Y_{1}|X_{2})}</tex> . 4)For a general relay channel, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C \: \leq \: \max_{p(x_{1},x_{2})} \min \,{I \,(X_{1}, X_{2};Y),I(X_{1};Y,Y_{1}|X_{2})</tex> . Superposition block Markov encoding is used to show achievability of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</tex> , and converses are established. The capacities of the Gaussian relay channel and certain discrete relay channels are evaluated. Finally, an achievable lower bound to the capacity of the general relay channel is established.