Solving math word problems concerning systems of equations with GPT models

Mingyu Zong,Bhaskar Krishnamachari

doi:10.1016/j.mlwa.2023.100506

Abstract

Researchers have been interested in developing AI tools to help students learn various mathematical subjects. One challenging set of tasks for school students is learning to solve math word problems. We explore how recent advances in natural language processing, specifically the rise of powerful transformer based models, can be applied to help math learners with such problems. Concretely, we evaluate the use of GPT-3, GPT-3.5, and GPT-4, all transformer models with billions of parameters recently released by OpenAI, for three related challenges pertaining to math word problems corresponding to systems of two linear equations. The three challenges are classifying word problems, extracting equations from word problems, and generating word problems. For the first challenge, we define a set of problem classes and find that GPT models generally result in classifying word problems with an overall accuracy around 70%. There is one class that all models struggle about, namely the “item and property” class, which significantly lowered the value. For the second challenge, our findings align with researchers’ expectation: newer models are better at extracting equations from word problems. The highest accuracy we get from fine-tuning GPT-3 with 1000 examples (78%) is surpassed by GPT-4 given only 20 examples (79%). For the third challenge, we again find that GPT-4 outperforms the other two models. It is able to generate problems with accuracy ranging from 76.7% to 100%, depending on the problem type.

Full Text