We study neural data-to-text generation to generate a sentence to describe a target entity based on its attributes. Specifically, we address two problems of the encoder-decoder framework for data-to-text generation: i) how to encode a non-linear input (e.g., a set of attributes); and ii) how to order the attributes in the generated description. Existing studies focus on the encoding problem but do not address the ordering problem, i.e., they learn the content-planning implicitly. The other approaches focus on two-stage models but overlook the encoding problem. To address the two problems at once, we propose a model named <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TransCP</b> to explicitly learn content-planning and integrate them into a description generation model in an end-to-end fashion. We propose a novel Transformer-based Pointer Network with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">gated residual attention</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">importance masking</i> to learn a content-plan. To integrate the content-plan with a description generator, we propose a tracking mechanism to trace the extent to which the content-plan is exposed in the previous decoding time-step. This helps the description generator select the attributes to be mentioned in proper order. Experimental results show that our model consistently outperforms state-of-the-art baselines by up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$2\%$</tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$3\%$</tex-math></inline-formula> in terms of BLEU score on two real-world datasets.