Abstract

The emergence of large language models (LLMs) such as OpenAI’s GPT-3, Google’s LaMDA, Meta’s OPT [2, 3, 7, 10] etc. have revolutionized the field of natural language processing (NLP). These models with upwards of hundreds of billions of parameters are trained on large unlabeled text corpora and can subsequently solve downstream tasks with little to no labeled data. While these models are increasingly versatile in their abilities, e.g., solving math word problems, the larger question of their ability to reason remains. Using and modifying the SVAMP dataset, we find that GPT-3’s davinci-002 model, in addition to having good performance on numerical math word problems, also performs well on the potentially harder symbolic version of the same problems. Furthermore, adopting a two-step approach (solve symbolically and then substitute numerical values) leads to better accuracy on the numerical test set in the zero-shot regime. Additionally, we find that the use of specific prompting techniques pushes the model, in many cases, to actively describe its thought process and aid in the final answer output when faced with a complex, multi-step problem, aligning with recent observations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call