The task of spatial layout estimation of monocular image is to segment an RGB image of indoor scenes with semantic surface labels (i.e., ceiling, floor, front wall, left wall, and right wall). Most recent methods have to produce layout hypotheses based on the estimated edge map or semantic labels, and then rank the layout hypotheses. In this paper, we present an end-to-end framework that can directly output the layout type and keypoint coordinates (defined in the LSUN challenge). The proposed method takes advantage of transfer learning via learning on the fake samples, i.e., plenty of artificial {type, keypoints, edge map} triplets are generated to learn the mapping from edge maps to keypoint coordinates. Generative adversarial network (GAN) is implemented in this work for domain adaptation of the edge maps. Experimental results show that the proposed method can achieve state-of-the-art layout estimation performance on benchmark datasets.