Emotion-cause pair extraction (ECPE) is a challenging task that aims to automatically identify pairs of emotions and their causes from documents. The difficulty of ECPE lies in distinguishing valid emotion-cause pairs from many irrelevant ones. Most previous methods have primarily focused on utilizing multi-task learning to extract semantic information solely from documents without explicitly encoding the relations between clauses. We propose a new approach that incorporates textual entailment paradigm aiming to infer the entailment relationship between the original document as the premise and the clauses or pairs described as the hypothesis. Our approach designs label-view hypothesis templates to improve ECPE by filtering out irrelevant emotion and cause clauses. Furthermore, we formulate candidate emotion-cause pairs as hypothesis statements, and define explicit multi-view symmetric templates to capture the emotion-cause relation semantics. The text entailment recognition for ECPE is finally implemented by fusing multi-view semantic information using a simplified capsule network. Our proposed model achieves state-of-the-art performance on ECPE compared to previous baselines. More importantly, this work demonstrates a novel effective way of applying the textual entailment paradigm to ECPE or clause-level causal discovery by designing multi-view hypothesis inference and information fusion.