The user identity linkage that establishes correspondence between users across networks is a fundamental issue in various social network applications. Efforts have recently been devoted to introducing network embedding techniques that map the different network users into the common representation space, thereby inferring user correspondence based on the similarities of their representations. However, existing studies that separately train the network embedding and space alignment in two stages may lead to conflict between the objectives of the two stages. Besides, the similarities between unlabeled cross-network user pairs are difficult to define and largely impact the result. Moreover, many previous methods still need plenty of labeled aligned user pairs to ensure performance, which may not be available. To address the above problems, we propose to solve the weakly-supervised user identity linkage problem via JOintly learning to Represent and Align, i.e., the JORA model. The architecture of JORA adopts the inductive graph convolutional network (GCN) that learns representations for each network. The model is jointly optimized by the representation learning component and alignment learning component. The former one aims to preserve the similarities between intranetwork users. The latter one aligns the different spaces by a projection function and aims to preserve the similarities between cross-network users. A specific attention mechanism is proposed to learn self-adaptive similarities for unlabeled user pairs during alignment learning and it reduces the error propagation caused by predefined similarities. The joint optimization helps perceive network characteristics during alignment and reduces the number of labeled users required. Experiments conducted on real social networks show that the proposed model achieves significantly better performance than the state-of-the-art methods.