While standing in the kitchen, you push a few metal bowls across the counter into the sink with a clang, and drape a towel over the back of a chair. In the other room, it looks like some precariously stacked wooden blocks fell off, and there’s an epic toy car accident. These interactions with our environment are something that humans experience on a daily basis at home, but this world may seem real, but it is not.
A new study from researchers at MIT, the MIT-IBM Watson AI Lab, Harvard University and Stanford University is enabling a rich virtual world, which is like stepping into “The Matrix.” Their platform, called Thredworld (TDW), simulates high-fidelity audio and visual environments, both indoor and outdoor, and allows users, objects and mobile agents to interact in real life and according to the laws of physics. allows. Object orientations, physical characteristics and velocities are calculated and calculated for liquids, soft bodies and hard objects, as interactions occur, producing precise collision and impact sounds.
TDW is unique in that it is designed to be flexible and generic, generate synthetic photo-realistic visual and audio renderings in real time, which can be compiled into audio-visual datasets, interacting within the scene Can be modified through, and adapted to human and nervous. Network learning and prediction testing. A variety of robotic agents and avatars can also be created within controlled simulations to perform, say, action planning and execution. And using virtual reality (VR), human attention and play behavior within space can provide real-world data, for example.
Study lead author Chuang Gan, MIT-IBM Watson AI Lab research scientist, says, “We are trying to build a general-purpose simulation platform that mimics the interactive richness of the real world for a variety of AI applications. “
Creating realistic virtual worlds with which to investigate human behavior and train robots has been a dream of AI and cognitive science researchers. “Most AI right now is based on supervised learning, which relies on huge datasets of human-annotated images or sounds,” says Josh McDermott, an associate professor in the Department of Brain and Cognitive Sciences (BCS) and an MIT-IBM Watson AI. Lab Project Lead. These descriptions are expensive to compile, creating a bottleneck for research. And for physical properties of objects, such as mass, which are not always readily apparent to human observers, labels may not be available at all. A simulator like TDW overcomes this problem by generating a view where all parameters and annotations are known. Many competing simulations were inspired by this concern but designed for specific applications; Through its flexibility, TDW aims to enable many applications that are poorly suited for other platforms.
Another advantage of TDW, McDermott notes, is that it provides a controlled setting to understand the learning process and facilitate the improvement of AI robots. Robotic systems that rely on trial and error can be taught in an environment where they cannot cause physical harm. Furthermore, “many of us are excited about the doors that open for experiments on humans to understand human perception and cognition in this type of virtual world. These very rich sensory landscapes have the potential to create , where you still have total control and complete knowledge of what is happening in the environment.”
McDermott, Gan, and their colleagues are presenting this research at the Conference on Neural Information Processing Systems (NeurIPS) in December.
behind the structure
The work began as a collaboration between a group of MIT professors, along with Stanford and IBM researchers, who linked individual research interests to hearing, vision, cognition and perceptual intelligence. TDW brought these together on one platform. “We were all interested in the idea of building a virtual world for the purpose of training AI systems that we could actually use as brain models,” says McDermott, who studies human and machine hearing. does. “So, we thought this kind of environment, where you could have objects that would interact with each other and then present realistic sensory data from them, would be a valuable way to start studying this.”
To achieve this, the researchers built TDW on a video game platform called the Unity3D Engine and is committed to incorporating both visual and auditory data rendering without any animation. The simulation consists of two components: the build, which renders the images, synthesizes the audio, and runs the physics simulation; and Controller, which is a Python-based interface where the user sends commands to the build. Researchers build and populate a scene by dragging furniture pieces, animals and vehicles from an extensive 3D model library of objects. These models accurately respond to light changes, and their physical structure and orientation in the scene dictate their physical behavior in space. Dynamic lighting models accurately simulate visible illumination, creating shadows and blurring that correspond to the appropriate time of day and sun angle. The team has also created a well-equipped virtual floor plan that the researchers can fill with agents and avatars. To synthesize true-to-life audio, TDW uses generative models of effects sounds that are triggered by collisions or other object interactions within the simulation. TDW also simulates noise attenuation and reverberation according to the geometry of space and the objects in it.
Reactions between two physics engines and interacting objects in TDW power distortions—one for rigid bodies, and the other for soft objects and liquids. TDW calculates instantaneousness with respect to mass, volume and density as well as any frictional or other forces acting on the material. This allows machine learning models to learn how objects with different physical properties will behave together.
Users, agents and avatars can bring scenes to life in a number of ways. A researcher can apply force directly to an object through controller commands, which can virtually set a virtual ball in motion. Avatars may be empowered to act or behave in a certain way within space – for example, with articulated organs capable of performing work experiments. Finally, VR heads and handsets could allow users to interact with virtual environments, potentially to generate human behavioral data that machine learning models can learn.
rich AI experience
To test and demonstrate TDW’s unique features, capabilities, and applications, the team ran a battery of tests comparing datasets generated by TDW and other virtual simulations. The team found that neural networks trained on scene image snapshots with randomly placed camera angles from TDW outperformed snapshots of other simulations in image classification tests and are closer to systems trained on real-world images. The researchers also designed and trained a material classification model in TDW on audio clips of small objects falling on surfaces and asked it to identify the types of interacting materials. They found that TDW earned a significant advantage over its competitors. Additional object-drop testing with neural networks trained on TDW showed that the combination of audio and vision is the best way to identify physical properties of objects, prompting further studies of audio-visual integration.
TDW is proving particularly useful for designing and testing systems that understand how physical phenomena in a scene will evolve over time. This includes facilitating a benchmark of how well a model or algorithm makes physical predictions, for example, the stability of a stack of objects, or the motion of objects after a collision – humans learn many of these concepts as children. learn, but many machines need to demonstrate this ability to be useful in the real world. TDW has also enabled comparisons of human curiosity and prediction against machine agents designed to evaluate social interactions in different scenarios.
Gan points out that these applications are only the tip of the iceberg. By expanding TDW’s physical simulation capabilities to more accurately depict the real world, “we are seeking to create new benchmarks for advancing AI technologies, and use these benchmarks to open up many new problems.” which have been difficult to study until now.”
The research team on the paper also includes MIT engineers Jeremy Schwartz and Seth Alter, who are critical to the operation of TDW; BCS Professors James DiCarlo and Joshua Tenenbaum; graduate students Aidan Curtis and Martin Shrimpf; and former postdocs James Traer (now an adjunct professor at the University of Iowa) and Jonas Kubilius PhD ’08. His collaborators are IBM director of the MIT-IBM Watson AI Lab David Cox; Research Software Engineer Abhishek Bhandardar; and Dan Gutfreund, IBM’s research staff member. Additional researcher co-authors are Harvard University assistant professor Julian de Freitas; and from Stanford University, assistant professors Daniel LK Yamins (a TDW founder) and Nick Haber, postdoc Daniel M. Bear, and graduate students Megumi Sano, Kuno Kim, Elias Wang, Damien Moroca, Kevin Feigelis and Michael Lingelbach.
This research was supported by the MIT-IBM Watson AI Lab.