Vision-based manipulators also need to see with their hands

Recent advances have improved the learning and generalization of object manipulation skills. However, there are aspects to be overlooked, for example, the design of the agent’s observation space. In the natural world, perspective can play an important role in learning and generalization.

Robot Manipulator.

Robot Manipulator. Image credits: Vincent Diamante, CC BY-SA 2.0 via Flickr

A recent paper published on examines the above insights into vision, the ubiquitous sensory modalities in robotic learning. In a grasping task a face-to-face comparison is made between hands-centered and third-person perspectives.

It has been shown that using a hands-centred perspective leads to a significant reduction in the overall delivery failure rate. To realize the benefits of hands-centred approaches, when their observational abilities are insufficient, the researchers propose to use both hands-centred and third-person approaches in conjunction.

We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared to the more commonly used global third-person perspective, a hands-centred (hand in the eye) perspective provides less observational capability, but we find that it consistently improves training efficiency and out-of-distribution generalization. does. These benefits hold for a variety of learning algorithms, experimental settings, and distribution variations, and for both simulated and real robot apparatuses. However, this is only if hand-centered observation is sufficient; Otherwise, it is necessary to incorporate a third-person perspective for learning, but it also damages the generalization of the distribution. To mitigate this, we propose to regularize third-person information streams through a variable information barrier. On six representative manipulation tasks with individual hand-centered observation adapted from a meta-world benchmark, this resulted in a state-of-the-art reinforcement learning agent operating from both approaches, improving the generalization out of its distribution on each task. While some practitioners have long placed cameras in robotic hands, our work systematically analyzes the benefits of doing so and is simple and widely applicable to improve end-to-end learned vision-based robotic manipulation. Provides insight.

Research Paper: Hsu, K., Kim, M. J., Rafaelov, R., Wu, J., and Finn, C., “Vision-based manipulators need to see with their hands too”, 2022. Link to paper:
Project Website:

Leave a Reply

Your email address will not be published.