Graph neural networks (GNNs) offer a viable solution to model the inter-dependencies among labeled and unlabeled samples in a semi-supervised manner. However, their performances can degrade dramatically when the number of labels is extremely limited, which is due to the limitations of typical graph convolutional network design in most existing GNNs, including over-smoothing, difficulty in extending the propagation step, and failure to preserve the distinctiveness of each node. To address the issues, we propose a dual separated attention-based graph neural network (DSA-GNN) to deal with label scarcity in semi-supervised node classification. Firstly, DSA-GNN decouples feature propagation from representation transformation to alleviate the problems of over-smoothing and overfitting. Secondly, DSA-GNN separates self-representation learning from neighbor-representation learning by two feature extractors with different learnable parameters. As a result, the commonality between connected nodes and the distinctiveness of each node can be both preserved. Thirdly, DSA-GNN incorporates an attention-based label propagation mechanism to refine the label prediction of each node, by aggregating label prediction among the neighborhood based on adaptive edge coefficients. The extensive experimental results on the real-world datasets demonstrate the superiority of DSA-GNN for semi-supervised node classification, especially when the observed labels are extremely limited.