Training AI reasoning systems that can perform simple mathematical reasoning is an important task, as numbers are ubiquitous in textual data.

Mathematical logic – abstract image. Image credit: Pxhere, CC0 Public Domain
A recent paper on arXIv.org presents a multi-task benchmark consisting of eight different functions, at the core of which the solution requires an understanding of simple arithmetic. They may require common sense reasoning or reading comprehension to combine with the basic skills of simple arithmetic.
The researchers showed that this is a challenging benchmark even for state-of-the-art large-scale language models, which yield poor scores even after fine-tuning. Furthermore, a memory-enhanced neural model is proposed to demonstrate the usefulness of such a multi-task meta dataset. In contrast to task-specific training, the model improves on average 3.4% when trained on all tasks combined.
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed for this purpose, state-of-the-art AI systems are brittle; Fail to execute the underlying mathematical logic when they appear in slightly different scenarios. Taking inspiration from the proposed GLUE in the context of natural language comprehension, we propose NumGLUE, a multi-task benchmark that evaluates the performance of AI systems on eight different tasks, with simple arithmetic understanding at its core. is required. We show that this benchmark is far from being solved with neural models that include state-of-the-art large-scale language models that perform significantly worse than humans (less than 46.4%). In addition, NumGLUE promotes knowledge sharing across tasks, especially those with limited training data, as evidenced by improved performance (average gain of 3.4% on each task) when a model is applied to all tasks as opposed to task-specific modeling. Jointly trained on tasks. Ultimately, we hope that NumGLUE will encourage systems that perform robust and general arithmetic reasoning within the language, the first step towards being able to perform more complex mathematical reasoning.
Research Article: Mishra, S., “NumGLUE: A Suite of Fundamental Yet Challenging Mathematical Reasoning Tasks”, 2022. Link: https://arxiv.org/abs/2204.05660