Abstract

In recent years, there has been a significant increase in the design of neural network models for solving math word problems (MWPs). These neural solvers have been designed with various architectures and evaluated on diverse datasets, posing challenges in fair and effective performance evaluation. This paper presents a comparative study of representative neural solvers, aiming to elucidate their technical features and performance variations in solving different types of MWPs. Firstly, an in-depth technical analysis is conducted from the initial deep neural solver DNS to the state-of-the-art GPT-4. To enhance the technical analysis, a unified framework is introduced, which comprises highly reusable modules decoupled from existing MWP solvers. Subsequently, a testbed is established to conveniently reproduce existing solvers and develop new solvers by combing these reusable modules, and finely regrouped datasets are provided to facilitate the comparative evaluation of the designed solvers. Then, comprehensive testing is conducted and detailed results for eight representative MWP solvers on five finely regrouped datasets are reported. The comparative analysis yields several key findings: (1) Pre-trained language model-based solvers demonstrate significant accuracy advantages across nearly all datasets, although they suffer from limitations in math equation calculation. (2) Models integrated with tree decoders exhibit strong performance in generating complex math equations. (3) Identifying and appropriately representing implicit knowledge hidden in problem texts is crucial for improving the accuracy of math equation generation. Finally, the paper also discusses the major technical challenges and potential research directions in this field. The insights gained from this analysis offer valuable guidance for future research, model development, and performance optimization in the field of math word problem solving.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call