Heterogeneous temporal network embedding aims to learn each node of different types of a heterogeneous temporal network in each snapshot into a low-dimensional vector representation, which can be used for various network analysis tasks such as node classification and relationship prediction. Our work proposes a novel heterogeneous temporal graph neural network embedding framework (TemporalHAN) based on hierarchical attention using a temporal convolutional network (TCN). In particular, we introduce node-level and semantic-level attention into heterogeneous graph neural networks to identify the importance of different levels between nodes. For each snapshot, we first utilise a new random walk algorithm (NRWA) to collect strongly connected heterogeneous neighbours for each node of different types and group them by node types. In addition, the algorithm utilises a damping factor to ensure that the more recent snapshots allocate more random walk steps. We then utilise node-level and semantic-level attention to learn the importance between a node and its random walk neighbour for a specific node type and learn the importance of different node-type for this node, respectively. Finally, we adopt TCN to capture the evolution information between snapshots. Experimental results on relationship prediction and node classification reveal that the TemporalHAN is competitive against diverse state-of-the-art approaches. Our code is available at https://github.com/Legendary-L/THAN.