Face representation in the wild is extremely hard due to the large scale face variations. Some deep convolutional neural networks (CNNs) have been developed to learn discriminative feature by designing properly margin-based losses, which perform well on easy samples but fail on hard samples. Although some methods mainly adjust the weights of hard samples in training stage to improve the feature discrimination, they overlook the distribution property of feature. It is worth noting that the miss-classified hard samples may be corrected from the feature distribution view. To overcome this problem, this paper proposes the hard samples guided optimal transport (OT) loss for deep face representation, OTFace in short. OTFace aims to enhance the performance of hard samples by introducing the feature distribution discrepancy while maintaining the performance on easy samples. Specifically, we embrace triplet scheme to indicate hard sample groups in one mini-batch during training. OT is then used to characterize the distribution differences of features from the high level convolutional layer. Finally, we integrate the margin-based-softmax (e.g. ArcFace or AM-Softmax) and OT together to guide deep CNN learning. Extensive experiments were conducted on several benchmark databases. The quantitative results demonstrate the advantages of the proposed OTFace over state-of-the-art methods. The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/FST-ZHUSHUMIN/OTFace</uri> .