Template-Based Contrastive Distillation Pretraining for Math Word Problem Solving.

Jinghui Qin,Liang Lin,Xiaodan Liang,Jiaqi Chen,Zhicheng Yang

doi:10.1109/tnnls.2023.3265173

Abstract

Since math word problem (MWP) solving aims to transform natural language problem description into executable solution equations, an MWP solver needs to not only comprehend the real-world narrative described in the problem text but also identify the relationships among the quantifiers and variables implied in the problem and maps them into a reasonable solution equation logic. Recently, although deep learning models have made great progress in MWPs, they ignore the grounding equation logic implied by the problem text. Besides, as we all know, pretrained language models (PLM) have a wealth of knowledge and high-quality semantic representations, which may help solve MWPs, but they have not been explored in the MWP-solving task. To harvest the equation logic and real-world knowledge, we propose a template-based contrastive distillation pretraining (TCDP) approach based on a PLM-based encoder to incorporate mathematical logic knowledge by multiview contrastive learning while retaining rich real-world knowledge and high-quality semantic representation via knowledge distillation. We named the pretrained PLM-based encoder by our approach as MathEncoder. Specifically, the mathematical logic is first summarized by clustering the symbolic solution templates among MWPs and then injected into the deployed PLM-based encoder by conducting supervised contrastive learning based on the symbolic solution templates, which can represent the underlying solving logic in the problems. Meanwhile, the rich knowledge and high-quality semantic representation are retained by distilling them from a well-trained PLM-based teacher encoder into our MathEncoder. To validate the effectiveness of our pretrained MathEncoder, we construct a new solver named MathSolver by replacing the GRU-based encoder with our pretrained MathEncoder in GTS, which is a state-of-the-art MWP solver. The experimental results demonstrate that our method can carry a solver's understanding ability of MWPs to a new stage by outperforming existing state-of-the-art methods on two widely adopted benchmarks Math23K and CM17K. Code will be available at https://github.com/QinJinghui/tcdp.

Full Text