In real industrial processes, fault diagnosis methods are required to learn from limited fault samples since the procedures are mainly under normal conditions and the faults rarely occur. Although attention mechanisms have become increasingly popular for the task of fault diagnosis, the existing attention-based methods are still unsatisfying for the above practical applications. First, pure attention-based architectures like transformers need a substantial quantity of fault samples to offset the lack of inductive biases thus performing poorly under limited fault samples. Moreover, the poor fault classification dilemma further leads to the failure of the existing attention-based methods to identify the root causes. To develop a solution to the aforementioned problems, we innovatively propose a supervised contrastive convolutional attention mechanism (SCCAM) with ante-hoc interpretability, which solves the root cause analysis problem under limited fault samples for the first time. First, accurate classification results are obtained under limited fault samples. More specifically, we integrate the convolutional neural network (CNN) with attention mechanisms to provide strong intrinsic inductive biases of locality and spatial invariance, thereby strengthening the representational power under limited fault samples. In addition, we ulteriorly enhance the classification capability of the SCCAM method under limited fault samples by employing the supervised contrastive learning (SCL) loss. Second, a novel ante-hoc interpretable attention-based architecture is designed to directly obtain the root causes without expert knowledge. The convolutional block attention module (CBAM) is utilized to directly provide feature contributions behind each prediction thus achieving feature-level explanations. The proposed SCCAM method is testified on a continuous stirred tank heater (CSTH) and the Tennessee Eastman (TE) industrial process benchmark. Three common fault diagnosis scenarios are covered, including a balanced scenario for additional verification and two scenarios with limited fault samples (i.e., imbalanced scenario and long-tail scenario). The effectiveness of the presented SCCAM method is evidenced by the comprehensive results that show our method outperforms the state-of-the-art methods in terms of fault classification and root cause analysis.