We study two-qubit circuits over the Clifford+CS gate set, which consists of the Clifford gates together with the controlled-phase gate CS = diag(1, 1, 1, i). The Clifford+CS gate set is universal for quantum computation and its elements can be implemented fault-tolerantly in most error-correcting schemes through magic state distillation. Since non-Clifford gates are typically more expensive to perform in a fault-tolerant manner, it is often desirable to construct circuits that use few CS gates. In the present paper, we introduce an efficient and optimal synthesis algorithm for two-qubit Clifford+CS operators. Our algorithm inputs a Clifford+CS operator U and outputs a Clifford+CS circuit for U, which uses the least possible number of CS gates. Because the algorithm is deterministic, the circuit it associates to a Clifford+CS operator can be viewed as a normal form for that operator. We give an explicit description of these normal forms and use this description to derive a worst-case lower bound of 5{{rm{log}}}_{2}(frac{1}{epsilon })+O(1) on the number of CS gates required to ϵ-approximate elements of SU(4). Our work leverages a wide variety of mathematical tools that may find further applications in the study of fault-tolerant quantum circuits.