Event temporal relation extraction is an important task for information extraction. The existing methods usually rely on feature engineering and require post-process to achieve optimization, though inconsistent optimization may occur in the post-process module and main neural network due to their independence. Recently, a few works start to incorporate the temporal logic rules into the neural network and achieve joint optimization. However, these methods still suffer from two shortcomings: (1) Although the joint optimization is applied, the differences between rules are neglected in the unified design of rule losses and further the interpretability and flexibility of the design of model are reduced. (2) Because of lacking abundant syntactic connections between events and rule-match features, the performance of the model may be suppressed by the inefficient interaction in training between features and rules. To tackle these issues, this paper proposes PIPER, a logic-driven deep contrastive optimization pipeline for event temporal reasoning. Specifically, we apply joint optimization (including multi-stage and single-stage joint paradigms) by combining independent rule losses (i.e., flexibility) to make PIPER more interpretable. Also, by proposing a hierarchical graph distillation network to obtain more abundant syntactic information, the designed rule-match features can effectively aid in the interaction between low-level features and high-level rules during training. The final experiments on TB-Dense and MATRES demonstrate that the proposed model can achieve competitive performance compared with the recent advances.