Arithmetic with language models: From memorization to computation

Davide Maltoni,Matteo Ferrara

doi:10.1016/j.neunet.2024.106550

Abstract

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding–Regression–Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Arithmetic with language models: From memorization to computation

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Journal: Neural Networks	Publication Date: Jul 1, 2024
License type: cc-by

Similar Papers

Are AI language models such as ChatGPT ready to improve the care of individuals with epilepsy?
Christian M Boßelmann ... Dennis Lal
Epilepsia | VOL. 64
Christian M Boßelmann, et. al.Christian M Boßelmann ... Dennis Lal
13 Mar 2023
Epilepsia | VOL. 64

Building Acoustic and Language Model for Continuous Speech Recognition in Bahasa Indonesia
Andreas Widjaja ... Vincent Elbert Budiman
Jurnal Teknik Informatika dan Sistem Informasi | VOL. 6
Andreas Widjaja, et. al.Andreas Widjaja ... Vincent Elbert Budiman
10 Aug 2020
Jurnal Teknik Informatika dan Sistem Informasi | VOL. 6

Quantifying and Analyzing Entity-Level Memorization in Large Language Models
Zhenhong Zhou ... Sen Su
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Zhenhong Zhou, et. al.Zhenhong Zhou ... Sen Su
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

The state of the art in language modeling
Joshua Goodman
-
Joshua GoodmanJoshua Goodman
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Arithmetic with language models: From memorization to computation

Abstract

Talk to us

Similar Papers

More From: Neural Networks