Large language models (LLMs) can perform tasks such as creating long text based on prompts, answering questions, or even engaging in a dialogue on a wide range of topics. This knowledge can be used to broaden the set of tasks that robots can plan and perform.
Therefore, a recent paper published on arXiv.org looks at the problem of how to extract knowledge in an LLM to enable an embodied agent such as a robot to follow high-level textual instructions.

Image credit: Fabrice Florin, CC BY-SA 2.0 via Flickr
The goal of the researchers is to make the LLM aware of the available and viable repertoire of skills, this can provide it with an awareness of both the agent’s capabilities and the current state of the environment. They propose an algorithm that extracts and leverages the knowledge within the LLM in physically based tasks.
Evaluation on real-world robotic tasks confirms that the algorithm can execute temporally extended, complex, and abstract instructions.
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge can be extremely useful for robots aiming to operate on high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, making it difficult to take advantage of them to make decisions in a given embodiment. For example, asking a language model to describe how to clean up a spill may result in a proper narrative, but it may not apply to a particular agent, such as a robot, which is required to be placed in a particular environment. This work is required to be done. We propose to provide real-world grounding through pre-trained skills, which are used to constrain the model to propose natural language verbs that are both feasible and contextually appropriate. The robot can act as the “hands and eyes” of the language model, while the language model provides higher-level semantic knowledge about the task. We show how low-level skills can be combined with larger language models so that the language model provides higher-level knowledge about the processes of executing complex and temporally extended instructions, while the value associated with these skills Functions provide the necessary basis for connecting it. Knowledge for a particular physical environment. We evaluate our method on several real-world robotic tasks, where we show a need for real-world grounding and that the approach is capable of carrying out long-horizon, abstract, natural language instructions on mobile manipulators. The project website and video can be found at this https URL
Research Article: Ah, M., “As I Can, As I Say: Grounding Languages in Robotic Affirmations”, 2022. Link: https://arxiv.org/abs/2204.01691