Signature verification is a frequently-used forensics technology in numerous safety-critical situations. Although convolutional neural networks (CNNs) have made significant advancements in the field of signature verification, their reliance on local neighborhood operations poses limitations in capturing the global contextual relationships among signature strokes. To overcome this weakness, in this paper, we propose a novel holistic-part unified model named TransOSV based on the vision transformer framework to solve offline signature verification problem. The signature images are first encoded into patch sequences by the proposed transformer-based holistic encoder to learn the global signature representation. Second, considering the subtle local difference between the genuine signature and forged signature, we design a contrast based part decoder along with a sparsity loss, which are utilized to learn the discriminative part features. With the learned holistic features and part features, the proposed model is optimized by the contrast loss function. To reduce the influence of sample imbalance, we also formulate a new focal contrast loss function. Furthermore, we conduct the proposed model to learn signature representations for writer-dependent signature verification task. The experimental results demonstrate the potential of the proposed TransOSV model for both writer-independent and writer-dependent signature verification tasks, achieving remarkable performance improvements and competitive results on four widely-used offline signature datasets.