Navigating confined spaces that are inaccessible to humans is an important goal of robotics and a challenge for robot design and control. One of the methods for this task is the simultaneous inversion and inversion of the two ends of a toroidal membrane that makes up the robotic body.

Self-propelled toroidal robot.  Still image from YouTube video.

Self-propelled toroidal robot. Still image from YouTube video.

A recent study published on arXiv.org presents a novel self-propelled soft moving toroidal robot. It continuously recycles an air-filled membrane using a motorized device that sits inside the pressurized part of the robot body.

The robot only needs a control signal to move and can adapt to obstacles in its environment. Researchers demonstrate that the robot can successfully navigate a cluttered environment, fit through an aperture, and climb a pipe. It can support significant weight without slipping while climbing steep climbs and climbs with motor torque independent of the force used to arm the robot against its environment.

There are many places that are inaccessible to humans where robots can help deliver sensors and equipment. Many of these spaces have three-dimensional passageways and uneven terrain that pose challenges for robot design and control. Averaging toroidal robots, which move through simultaneous reversal and reversal of their body’s contents, are promising for navigation in these types of spaces. We present a new soft moving toroidal robot that propels itself using a motorized device inside a membrane filled with air. Our robot requires only a control signal to move, which can conform to its environment, and climb vertically with a motor torque that is used to propel the robot against its environment. independent of force. We obtain and validate models of the forces involved in its motion, and we demonstrate the robot’s ability to navigate a maze and climb a pipe.

Research Article: Badillo Perez, NG and Codd, MM, “Self-Propelled Soft Averting Toroidal Robot for Navigation and Climbing in Confined Space”, 2022. Link of Paper: https://arxiv.org/abs/2203.14455
Video Link: https://www.youtube.com/watch?v=tSlkCkNAT44


Read More

Garment transfer, the process of transferring clothing over the image of a query person without changing the identity, is a task with great commercial potential. A recent paper published on arXiv.org explores the in-the-wild garment transfer problem.

Image credits: PXFuel, CC0 Public Domain

Image credits: PXFuel, CC0 Public Domain

Researchers suggest a self-monitored training plan that works on easily accessible dance videos. A novel generative network is proposed to facilitate arbitrary garment transfer under complex poses. It integrates the advantages of two methods currently in use: 2D pixel flow and 3D vertex flow. Cyclic online optimization has been designed to further enhance the synthesis quality.

A new large-scale video dataset has also been created to facilitate related human-centered research areas, which are not limited to virtual trials. This model successfully produces results with sharp textures and intact garment shapes.

While significant advances have been made in garment transfer, one of the most applicable directions of human-centered image formation, existing work ignores wild imagery, presenting severe garment-person misalignment as well as a noticeable decline in fine texture detail. We do. Therefore, this paper participates in the virtual attempt at real-world scenes and the necessary improvements in authenticity and naturalness, especially for loose apparel (eg, skirts, formal clothing), challenging poses (eg, cross arms, bent legs) brings. Disorganized background. In particular, we find that pixel flow excels at handling loose fabrics whereas top flow is preferred for difficult poses, and combining their benefits we propose a novel generative network called wFlow that is effective can significantly advance apparel transfer in an in-the-wild context. , Furthermore, the former approaches require paired images for training. Instead, we cut down on laboriousness by working on a newly created large-scale video dataset called Dance50k with self-supervised cross-frame training and an online cycle optimization. The proposed dance can promote virtual dressing of the real world by covering a variety of clothing under 50k dancing poses. Extensive experiments demonstrate the superiority of our wFlow in generating realistic garment transfer results for in-the-wild images without resorting to expensive paired datasets.

Research Article: Dong, X., “Dressing in the Wild by Watching Dance Videos”, 2022. Link of Paper: https://arxiv.org/abs/2203.15320
project site: https://awesome-wflow.github.io/


Read More

Surveillance cameras have a detection problem, driven by an inherent tension between usability and privacy. As these mighty little tools appear everywhere, the use of machine learning tools has largely led to automated video content analysis – but with increased surveillance, there are currently no legally enforceable rules to limit privacy invasions. are not.

A camera for video surveillance.  Image credit: Claudio Balcazar (free Pexels license) via Pexels

A camera for video surveillance. Image credit: Claudio Balcazar (free Pexels license) via Pexels

Security cameras can do a lot – they’ve become smarter and supremely capable than the ghosts of grainy photos of the past, the “hero tool” often used in crime media. (“Look at that little hazy blue blob in the right-hand corner of that densely populated corner—we got that!”)

Now, video surveillance can help health officials measure the fraction of people wearing masks, enable transportation departments to monitor the density and flow of vehicles, bikes and pedestrians, and allow businesses to track shopping behavior. Provides better understanding. But why does privacy remain a weak consideration?

The status quo is to retouch videos with blurry faces or black boxes. Not only does this prevent analysts from asking some real questions (e.g., are people wearing masks?), it doesn’t always work; The system may miss some faces and leave them blurry for the world to see.

Unsatisfied with this status quo, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSEL), in collaboration with other institutions, came up with a system to better guarantee privacy in video footage from surveillance cameras.

Called “PREVID”, the system lets analysts submit video data queries, and adds a bit of noise (extra data) to the final result to ensure that an individual cannot be identified. The system is based on a formal definition of privacy – “differential privacy” – which allows access to aggregated data about personal data without revealing personally identifiable information.

Typically, analysts will have access to the entire video to do whatever they want with it, but Privid makes sure the video isn’t a free buffet. Honest analysts can get access to the information they need, but that access is so restricted that malicious analysts can’t do much with it.

To enable this, instead of running code on the entire video in one shot, Privid breaks up the video into smaller chunks and runs the processing code on each chunk. Instead of getting the result from each piece, the segments are aggregated, and additional noise is added. (You are also aware of the error you are going to have on your result – maybe a 2 percent error margin given the extra noisy data).

For example, the code might output the number of people viewed in each video segment, and the aggregation could be “sum”, which is the total number of people wearing face coverings, or “average” to estimate the density of the crowd. It is possible.

Privid allows analysts to use their own deep neural networks that are common for video analysis today. This lets analysts ask questions that Privid’s designers didn’t anticipate. Across a variety of videos and questions, Privid was accurate to within 79 to 99 percent of the non-private system.

“We’re at a stage right now where cameras are practically ubiquitous. If on every street corner, everywhere you go, and if someone can actually process all those videos as a whole, you can imagine creating an accurate timeline of when and where a person has gone,” says MIT CSAIL PhD student Frank Kangialosi, lead author on a paper about Privid. “People are already concerned about location privacy with GPS – overall video data can capture not only your location history, but also the mood, behavior and more at each location.”

Privid introduces a new notion of “duration-based privacy”, which separates the definition of privacy from its enforcement – with ambiguity, if your privacy goal is to protect all people, enforcement mechanisms need to be protected against people. To find it requires some work, which it may or may not totally do. With this mechanism, you don’t have to specify everything completely, and you’re not hiding more information than you need.

Let’s say we have a video with a street scene. Two analysts, Alice and Bob, both claim that they want to count the number of people passing through each hour, so they submit a video processing module and ask for a sum aggregation.

The first analyst is the city planning department, which uses this information to understand footfall patterns and to plan sidewalks for the city. Their model counts people and outputs this count for each video segment.

The second analyzer is malicious. They expect to recognize “Charlie” every time the camera approaches. Their model only looks for Charlie’s face and outputs a large number if Charlie is present (ie, the “signal” they are trying to extract), or zero otherwise. His hope is that if Charlie was present, the amount would not be zero.

From Privid’s point of view, both these questions look similar. It’s hard to determine reliably what their models will be doing internally, or what analysts hope to use the data for. This is where the noise comes in. Privid performs both queries and adds the same amount of noise to each. In the first case, because Alice was counting all people, this noise would have only a small effect on the result, but would likely not affect usability.

In the second case, since Bob was looking for a specific signal (Charlie was visible only for some parts), the noise is enough to prevent them from knowing whether Charlie was there or not. If they see a non-zero result, it may be because Charlie was actually there, or because the model outputs “zero”, but the noise made it non-zero. Privid didn’t need to know anything about when and where Charlie appeared, the system only needed to know a certain upper bound about when Charlie appeared, compared to tracing exact locations. It is easier to specify in which earlier methods depend.

The challenge is determining how much noise to add – Privid wants to add enough to hide it all, but not so much that it becomes useless to analysts. Adding noise to the data and asserting queries over time means that your result will not be as accurate as it could be, but the results are still useful while providing better confidentiality.

Written by Rachel Gordon

Source: Massachusetts Institute of Technology


Read More

Surveillance cameras have a detection problem, driven by an inherent tension between usability and privacy. As these mighty little tools appear everywhere, the use of machine learning tools has largely led to automated video content analysis – but with increased surveillance, there are currently no legally enforceable rules to limit privacy invasions. are not.

Security cameras can do a lot – they’ve become smarter and supremely capable than the ghosts of grainy photos of the past, the “hero tool” often used in crime media. (“Look at that little hazy blue blob in the right-hand corner of that densely populated corner—we got that!”) Now, video surveillance can help health officials measure the fraction of people wearing masks, transporting Enables departments to monitor the density and flow of vehicles, bikes and pedestrians, and gives businesses a better understanding of shopping behavior. But why does privacy remain a weak consideration?

The status quo is to retouch videos with blurry faces or black boxes. Not only does this prevent analysts from asking some real questions (e.g., are people wearing masks?), it doesn’t always work; The system may miss some faces and leave them blurry for the world to see. Unsatisfied with this status quo, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSEL), in collaboration with other institutions, came up with a system to better guarantee privacy in video footage from surveillance cameras. Called “PREVID”, the system lets analysts submit video data queries, and adds a bit of noise (extra data) to the final result to ensure that an individual cannot be identified. The system is based on a formal definition of privacy – “differential privacy” – which allows access to aggregated data about personal data without revealing personally identifiable information.

Typically, analysts will have access to the entire video to do whatever they want with it, but Privid makes sure the video isn’t a free buffet. Honest analysts can get access to the information they need, but that access is so restricted that malicious analysts can’t do much with it. To enable this, instead of running code on the entire video in one shot, Privid breaks up the video into smaller chunks and runs the processing code on each chunk. Instead of getting the result from each piece, the segments are aggregated, and additional noise is added. (You are also aware of the error you are going to have on your result – maybe a 2 percent error margin given the extra noisy data).

For example, the code might output the number of people viewed in each video segment, and the aggregation could be “sum”, which is the total number of people wearing face coverings, or “average” to estimate the density of the crowd. It is possible.

Privid allows analysts to use their own deep neural networks that are common for video analysis today. This lets analysts ask questions that Privid’s designers didn’t anticipate. Across a variety of videos and questions, Privid was accurate to within 79 to 99 percent of the non-private system.

“We’re at a stage right now where cameras are practically ubiquitous. If on every street corner, everywhere you go, and if someone can actually process all those videos as a whole, you can imagine creating an accurate timeline of when and where a person has gone,” says MIT CSAIL PhD student Frank Kangialosi, lead author on a paper about Privid. “People are already concerned about location privacy with GPS – overall video data can capture not only your location history, but also the mood, behavior and more at each location.”

Privid introduces a new notion of “duration-based privacy”, which separates the definition of privacy from its enforcement – with ambiguity, if your privacy goal is to protect all people, enforcement mechanisms need to be protected against people. To find it requires some work, which it may or may not totally do. With this mechanism, you don’t have to specify everything completely, and you’re not hiding more information than you need.

Let’s say we have a video with a street scene. Two analysts, Alice and Bob, both claim that they want to count the number of people passing through each hour, so they submit a video processing module and ask for a sum aggregation.

The first analyst is the city planning department, which uses this information to understand footfall patterns and to plan sidewalks for the city. Their model counts people and outputs this count for each video segment.

The second analyzer is malicious. They expect to recognize “Charlie” every time the camera approaches. Their model only looks for Charlie’s face and outputs a large number if Charlie is present (ie, the “signal” they are trying to extract), or zero otherwise. His hope is that if Charlie was present, the amount would not be zero.

From Privid’s point of view, both these questions look similar. It’s hard to determine reliably what their models will be doing internally, or what analysts hope to use the data for. This is where the noise comes in. Privid performs both queries and adds the same amount of noise to each. In the first case, because Alice was counting all people, this noise would have only a small effect on the result, but would likely not affect usability.

In the second case, since Bob was looking for a specific signal (Charlie was visible only for some parts), the noise is enough to prevent them from knowing whether Charlie was there or not. If they see a non-zero result, it may be because Charlie was actually there, or because the model outputs “zero”, but the noise made it non-zero. Privid didn’t need to know anything about when and where Charlie appeared, the system only needed to know a certain upper bound about when Charlie appeared, compared to tracing exact locations. It is easier to specify in which earlier methods depend.

The challenge is determining how much noise to add – Privid wants to add enough to hide it all, but not so much that it becomes useless to analysts. Adding noise to the data and asserting queries over time means that your result will not be as accurate as it could be, but the results are still useful while providing better confidentiality.

Written by Rachel Gordon

Source: Massachusetts Institute of Technology


Read More

An immersive visualization platform that virtually recreates the experience of living in a wildfire will help artists, designers, firefighters and scientists better understand and communicate the dynamics of these extreme events.

I fire, based at the iCinema Center for Interactive Cinema Research at UNSW Sydney, is a sector-first artificially intelligent (AI) immersive environment that visualizes the unpredictable behavior of wildfires. It gives users and researchers a clear understanding of wildfire dynamics on a 1:1 scale and in real time in a secure virtual environment.

An AI-powered immersive visualization suite that recreates the experience of being in a wildfire will transform how we perceive, react and prepare for an event. Image credit: UNSW.

The five-year project is funded by an ARC award-winning fellowship of lead researcher Scientia Professor Denis Del Favreau, director of the EyeCinema Research Center, UNSW Art, Design and Architecture. The philosopher-turned-artist uses artistic imitation to sensitively explore diverse risky scenarios, directly addressing issues like global warming in visceral and compelling ways.

“Forest fires are a whole new generation of fires,” says Prof. Del Favarro. “We are experiencing an accelerating level of global warming that is driving fires of scale, speed and violence never before seen in recorded human history.”

“It uses real-world data to visualize not only what they look like, but also what they feel like.” [and sound] like. Voice is important… [because] Wildfire has a special acoustic that is completely unique.”

The project, like the Centre, is interdisciplinary in approach working in art, design, computing and science. It brings together global experts in fire research, including computer and fire scientists at UNSW such as Professor Maurice Pagnuco and Professor Jason Sharples, Data 61, the University of Melbourne, San Jose State University, and more than 15 international industry and government partners. Australian Fire and Emergency Service Authority Council, Fire Rescue NSW, CalFire, Pau Costa Foundation and ARC Center of Excellence in Climate Extremes.

equipping areas prone to fire

Unlike traditional bushfires, which proceed relatively predictably, wildfires are fundamentally unpredictable. They can create their own weather systems generating lightning storms that can ignite new fires; This, in addition to their size and speed, makes their behavior difficult to predict, Prof. Del Favarro says.

“Situational awareness is important in a wildfire… It’s like being in a war zone. You don’t know where the dangers are. They can surround you and get on top of you,” he says. “Therefore, we are developing a way to visualize this type of dynamic by using artificial intelligence to drive the visualization so that the fire behaves unexpectedly according to its logic, not according to our expectations.”

The platform will provide a single tool for two different users.

For fire scientists, firefighters and fire organizations in Australia and internationally, it will facilitate research and training in the dynamics of wildfire scenarios, thereby opening up for a more agile and collaborative approach to fire planning, group training and fire management The end will be able to decide.

It will enable artists, curators and designers to imaginatively explore wildfire landscapes using a digital palette with a vast range of atmospheres, flora and topography to enhance public participation and understanding of these landscapes.

The platform will develop more lateral and collaborative thinking among firefighters, group training and users working in fire planning. Image credit: UNSW.

Users can share and explore the environment across multiple locations and platforms, including mobile 360-degree 3D cinema as well as more portable 3D projection screens, 3D head-mounted displays for laptops and tablets.

Also, to use in fire science and art, I fire The project will develop a geo-specific software application as part of its resource toolkit. The application can be downloaded in fire-sensitive areas for use by fire researchers, first responders and the community.

“Local councils can implement this in their own geographic area to show people how wildfire can go in their community. This will become part of their portfolio of educational tools for fire preparedness,” said Prof. Del Favarro says.

They say the project will also develop a pipeline for sharing and integrating diverse data sets – fire behavior, management procedures and protocols, for example – collected by a range of agencies to facilitate research into wildfires. Done, they say.

“This will set the benchmark for using this data to effectively observe these events.”

AI is a powerful research partner

It is imperative to use artificial intelligence to understand these data sets.

“AI optimizes our ability to perceive the dynamics of fire in landscapes,” Prof. Del Favarro says. “It can help us process this complex data more quickly and in more practical ways than we do” [as humans] What can we do.

“And we really need help at this time because the existence of extreme events like wildfires is beyond our imagination in terms of impact and difficult to model.”

The project will also explore the wildfire landscape through a range of creative applications for film, museums and contemporary arts.

,[AI-driven immersive visualisations] allows you to imagine whole new creative worlds that you otherwise wouldn’t be able to do with just human cognition,” he says.

I fire The platform will be developed for more specific industry needs, potentially commercial in nature. For example, Data 61 UNSW will work with iCinema to create an immersive experience of their fire application, Spark, that spans the bushfire model to help plan and manage bushfires.

Beneficial visual technologies across disciplines

Pro. Del Favarro says these types of advanced art and technology frameworks are applicable to a wide variety of needs.

UNSW iCinema’s research includes interactive art landscapes, intelligent database systems, immersive design modeling and extreme event simulation. Past projects have contributed to contemporary art, cultural heritage, defense monuments, digital museums and mining simulations.

icast The project, for example, delivered a suite of virtual reality simulations to the Shenyang Research Institute of China Technology and Engineering Group, China’s leading research and training institute for mine safety. The project, later commercialized, produced a highly realistic simulation of an underground mine that allowed up to 30 trainees to interact with risk and technology scenarios simultaneously. The immersive module provided a highly effective alternative to training through lengthy manuals, training over 30,000 miners and reducing fatal and serious injury in mining industries in China and Australia.

Pro. Artistic technologies that provide life experiences can help us better understand and address the unpredictable and turbulent landscapes that characterize the terrestrial changes we are experiencing, says Del Favreau.

“I’m very interested in creating virtual worlds to enhance the way I connect with the physical world around me,” he says.

“Creating a simulated world is a way of collaborating with an artificially intelligent twin to create a new type of partnership that combines human situational understanding and decision-making with the subtlety and adaptability of AI to help establish patterns and predict behavior.” integrates speed and scale.”

Source: UNSW


Read More

The growing gap in the transition of inventions from research laboratories to market is slowing the development and scale of new hardware technologies in the United States. This is especially evident in the complex microelectronics progress in which the US leadership has lagged. A workshop on semiconductor technology translation and hard-tech startups recently gathered stakeholders from across the country to analyze this challenge and propose solutions.

Presented jointly by MIT, the State University of New York (SUNY), and Rensselaer Polytechnic Institute (RPI) last month, the virtual event was organized by academic researchers, members of industry, venture capital firms, state and federal agencies, nonprofits, brought the enterprise together. For a broader conversation about accelerators, and how startups regain American leadership.

In his inaugural address, MIT Provost Martin Schmidt challenges speakers and attendees to seize the moment. “We need to think boldly, act decisively, be ready to reinvent our practices and not be burdened by outdated models of engagement,” said Schmidt, who took over as RPI’s presidency in July. “This workshop will tackle an area that is ripe for work, and that is the process by which we influence education work through commercialization.”

An audience of 632 individuals joined 30 invited speakers to discuss four sessions: innovation ecosystem, stakeholder perspectives, building proto-companies and startups, and startup experiences and shared features. “It takes a village to develop an innovative idea into a prototype. It can then become a product that is expected to reach millions of people,” said MIT.Nano, Faculty Director of MIT.Nano, and Workshop Co. -said the organizer Vladimir Bulovich. “Initiating and maintaining more hard-tech startups will create more technology and more new jobs, benefiting established industry partners, nurturing new industries and revitalizing American leadership in microelectronics.”

environment created for success

What contributes to a thriving innovation ecosystem – and how do we build one for the hard-tech? Fiona Murray, William Porter (1967) Professor of Entrepreneurship and Associate Dean of Innovation and Inclusion at the MIT Sloan School of Management, suggested three main features: a strategic focus, a system of key resources for founders such as human talent and funding, and stakeholder connectivity – a community built intentionally around priority areas.

Watch a video playlist of Tech Translation Workshop presentations.

This concept of connectivity resonated throughout the workshop. “Proximity fosters connectivity fosters collaboration,” said Bob Metcalf, professor emeritus of innovation and entrepreneurship at the University of Texas at Austin. Metcalf noted seven “species” needed for a thriving startup ecosystem: funding agencies, research professors, graduate students, scaling entrepreneurs, venture capitalists, strategic partners and early adopters.

To explore the perspectives of these multiple stakeholders, the workshop was attended by experts from different geographical locations in different roles. Speaking from an industry and venture capital perspective, Applied Materials Senior Vice President and Chief Technology Officer Onkaram Nalamasu, Intel Capital Managing Director Sean Doyle and In-Q-Tel Managing Director Eileen Tangal offer advice on what they look for when investing. Huh. difficult technique. Common themes included proof of concept, shared-development facilities for both cost and efficiency, access to talent and the ability to engage customers.

“Hard-tech startups have this very wide valley of death. They face a lot of challenges when it comes to finding the right people, getting money and securing partnerships.” said Tanghal, Aging in the Semiconductor Sector Described barriers to startups such as reduced workforce, supply chain issues and difficulties in finding the first customer.

from university to commercialization

The expansion of the talent pool has its constraints. Julie Lenzer, chief innovation officer at the Advanced Regenerative Manufacturing Institute, discusses the challenges universities face in supporting hard tech – the slow pace of academics, risk aversion, intellectual property (IP) ownership, mission misalignment when faced by faculty or students. Entrepreneurial activity is not observed by the institution, and the problem is that universities do not produce finished products for the market.

“Often, when we come out of the lab, it is a very low and slow technology preparation level; This is very early technology,” said Lenzer, formerly chief innovation officer at the University of Maryland. “It presents a high technical risk – is it going to work? We don’t know yet, but it will take a lot of capital to get there. Is the market ready for it? Is there a way to work for anyone? Is disrupting much better? It’s not just about technology, it’s about market opportunities.”

Support systems are needed to help build hard-tech startups. Greentown Labs Membership Senior Director Jason Ethier, Active Executive Managing Director Amy Rose, and Howard University College of Engineering and Architecture Director of Innovation Grant Warner explored best practices for preparing founders, articulating a business vision, marketing Emphasized the importance of understanding the need. And being open-minded and coachable.

startup perspective

The workshop also called upon startup founders to share experiences and pain points. Veronica Stelmakh, CEO and co-founder of Mesodyne, headlined a number of entrepreneurship competitions and accelerator events in which she and her co-founders learned to discover clients, learn how to run a business, and which grants to apply for. ran away.

Now, she said, their main challenge is cost reduction. “For that we need volume. To get volume, we need traction with customers. To get traction with customers, you need a product, and if your product is expensive, you’re not there.” It’s a chicken and egg problem, so we need programs, especially for hard-tech startups, that enable us to build things with limited resources.”

Access to shared facilities and tool sets to help reduce costs and promote the development of new hard-tech technologies was a theme repeated throughout the workshop. “Space is precious,” said John Icoponi, vice president of technology strategy at NY Creates, which runs the Albany Nanotech Complex. “We find that startups need to have the ability to convert materials, flexible space and equipment that no entity can afford to buy.”

In closing, Bob Karlisek, Professor of Electrical, Computer and Systems Engineering at RPI and co-organizer of the workshop, spoke about the challenges facing the academia. “We need fab technology earlier in the educational process,” Karlisek said. “We need more student-and-faculty-accessible fabs to build that talent pool, and drive much faster innovation at the university level. We need to think about IP strategies to protect startups. We need better startups.” Phase I funding model, reserve early stage capital needs large pools.”

“Universities should be viewed not only as training the next engineers, but as talent generators for the next batch of entrepreneurs,” continued co-organizer Nick Querques, director of new ventures at the SUNY Research Foundation. “Significant capital is needed at all levels, starting with non-dilutive government funding for technologies and multi-institutional centers, and ending with investment from industry in both startups and facilities.”

The workshop on Semiconductor Technology Translation and Hard-Tech Startups demonstrated a high level of concern and interest from stakeholders across the country to improve the process of acquiring new hard-techs in the market. For change, it will take the whole ecosystem.

“It’s not just the idea, it’s the process of growing that idea, training the talent, and understanding the stakeholders you face,” Bulovic summarized. “We need a national program which can bring many more startups on a big scale. We need more shots at target, and that will lead to greater success and resurgence in the national ability to recapture dominance in the microelectronics and many other hard-tech industries. ,

Read More

Automated speech-recognition technology has become more common with the popularity of virtual assistants such as Siri, but many of these systems only perform well with the most widely spoken languages ​​being only around 7,000 in the world.

Since these systems largely do not exist for less common languages, the millions of people who speak them are cut off from the many technologies that rely on speech, from smart home devices to assistive technologies and translation services.

Recent advances have enabled machine learning models that can learn the world’s unusual languages, which lack the large amounts of written speech needed to train algorithms. However, these solutions are often too complex and expensive to be widely implemented.

Researchers at MIT and elsewhere have now tackled this problem by developing a simple technique that reduces the complexity of an advanced speech-learning model, enabling it to run more efficiently and achieve higher performance.

His technique involves removing unnecessary parts of a common, but complex, speech recognition model and then making minor adjustments so that it recognizes a specific language. Since only small changes are needed after the size reduction of the large model, it is much less expensive and time-consuming to teach this model an unusual language.

The work could help level the playing field and bring automated speech-recognition systems to many areas of the world where they have not yet been deployed. The systems are important in some academic environments, where they can assist blind or low vision students, and are being used to improve efficiency in health care settings through medical transcription and in the legal field through court reporting. Is. Automated speech recognition can also help users learn new languages ​​and improve their pronunciation skills. This technique can also be used for transliteration and documentation of rare languages ​​that are in danger of extinction.

“This is an important problem to solve because we have amazing technology in natural language processing and speech recognition, but research in this direction will help us scale the technology to many more unexplored languages ​​in the world,” Cheng-i Jeff Lai, PhD student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSEL) and first author of the paper.

Lai collaborated with fellow MIT PhD students Alexander H. co-wrote the paper with Liu, Yi-Lun Liao, Samir Khurana and Yung-Sung Chuang; His advisor and senior author is James Glass, senior research scientist and head of the Spoken Language Systems Group at CSAIL; MIT-IBM Watson AI Lab research scientists Yang Zhang, Shiyu Chang, and Caizhi Qian; and David Cox, IBM director of the MIT-IBM Watson AI Lab. The research will be presented in December at the Conference on Neural Information Processing Systems.

learn speech from audio

Researchers studied a powerful neural network that has been purported to learn basic speech from raw audio, called Wave2vec 2.0.

A neural network is a series of algorithms that can learn to recognize patterns in data; Designed loosely from the human brain, neural networks are organized into layers of interconnected nodes that process data input.

Wave2vec 2.0 is a self-supervised learning model, so it learns to recognize spoken language after being fed a large amount of unlabeled speech. The training process requires only a few minutes of written speech. This opens the door to the speech recognition of unusual languages ​​that lack a large amount of written speech, such as Wolof, which is spoken by 5 million people in West Africa.

However, neural networks have about 300 million individual connections, so training it on a specific language requires an enormous amount of computing power.

The researchers set out to improve its efficiency by sorting this network. Just as a gardener cuts off unnecessary branches, neural network pruning involves removing connections that are not necessary for a specific task, in this case, learning a language. Lai and his colleagues wanted to see how the pruning process would affect the speech recognition performance of this model.

After pruning the entire neural network to form a smaller subnetwork, they trained the subnetwork with a small amount of labeled Spanish speech and then with French speech, a process known as finetuning.

“We would expect these two models to be very different because they are geared to different languages. But what is surprising is that if we pruned these models, they would end up with highly similar sorting patterns. For French and Spanish, they have a 97 percent overlap,” says Lai.

They conducted experiments using 10 languages, ranging from Romance languages ​​like Italian and Spanish to completely different letter languages ​​like Russian and Mandarin. The results were similar – there was a huge overlap across all refined models.

a simple solution

Drawing on that unique discovery, he developed a simple technique to improve efficiency and boost the performance of neural networks, called PARP (Prune, Adjust, and Re-Prune).

In the first step, pre-trained speech recognition neural networks such as Wave2Wake 2.0 are truncated by removing unnecessary connections. Then in the second step, the resulting subnetwork is adjusted for a specific language, and then disconnected again. During this second phase, the removed connections are allowed to grow back if they are important for that particular language.

Because the connection is allowed to grow back during the second phase, the model only needs to be finetuned once, not multiple iterations, which greatly reduces the amount of computing power required.

technology test

Researchers put PARP to the test against other common pruning techniques and found that it outperformed them all for speech recognition. This was especially effective when there was only a very small amount of written speech to train.

They also showed that PARP can form a small subnetwork that can be fine-tuned for 10 languages ​​simultaneously, eliminating the need to have separate subnetworks for each language, allowing these models to be trained. The cost and time required for this may also be less.

Going forward, the researchers want to apply PARP to text-to-speech models and see how their technology can improve the efficiency of other deep learning networks.

“There is an increasing need to put large deep learning models on growing devices. Having more efficient models allows these models to be squeezed onto more primitive systems such as cell phones. Speech technology is very important for cell phones, for example, But having a smaller model doesn’t mean it’s computing faster. We need additional technology to compute faster, so there’s still a long way to go,” says Zhang.

Self-supervised learning (SSL) is changing the field of speech processing, so making SSL models smaller without underperforming is an important research direction, says Hung-Yi Lee, associate professor in the Department of Electrical Engineering and Computer Science. Information Engineering at National Taiwan University, who was not involved in this research.

“PARP trims the SSL model, and at the same time, surprisingly improves recognition accuracy. In addition, the paper shows that the SSL model has a subnet, which is suitable for ASR tasks of many languages. This finding Will encourage research on language/task agnostic network pruning. In other words, the SSL model can be compacted while maintaining its performance on different tasks and languages,” he says.

This work is partially funded by the MIT-IBM Watson AI Lab and the 5k Language Learning Project.

Read More

Robots can deliver food to a college campus and one-up holes on a golf course, but even the most sophisticated robots can’t carry out the basic social interactions that are vital to everyday human life.

MIT researchers have now incorporated some social interactions into the framework of robotics, allowing machines to understand what it means to help or hinder one another, and learn to perform these social behaviors on their own. In a simulated environment, a robot looks at its partner, guesses what task it wants to accomplish, and then helps or hinders this other robot based on its own goals.

The researchers also showed that their model creates realistic and predictable social interactions. When they showed videos of these simulated robots interacting with humans with each other, the human audience mostly agreed with the model about what type of social behavior was taking place.

Enabling robots to demonstrate social skills can lead to smoother and more positive human-robot interactions. For example, a robot in an assisted living facility could use these capabilities to help create a more caring environment for elderly individuals. The new model could enable scientists to quantitatively measure social interactions, which could help psychologists study autism or analyze the effects of antidepressants.

“Robots will soon be in our world, and they really need to learn how to communicate with us on human terms. They need to understand when it’s time for them to help and it’s time for them to see what they can do to prevent something from happening. It’s very early work and we’re barely scratching the surface, but I think it’s the first serious attempt at understanding how socially important things have to do with humans and machines. What does it mean to have a conversation,” says lead research scientist and head of InfoLab, Boris Katz. group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Center for Brains, Minds and Machines (CBMM).

Co-lead author with Katz on the paper is Ravi Tejwani, a research assistant at CSAIL; co-lead author Yen-Ling Kuo, a CSAIL PhD student; Tianmin Shu, postdoc in the Department of Brain and Cognitive Sciences; And the senior author is Andrei Barbu, a research scientist at CSAIL and CBMM. The research will be presented in November at the Conference on Robotic Learning.

a social simulation

To study social interactions, the researchers created a simulated environment where robots pursue physical and social goals as they move around a two-dimensional grid.

A physical target is related to the environment. For example, a robot’s physical goal might be to navigate a tree at a fixed point on the grid. A social goal involves guessing what another robot is trying to do and then acting on that guess, such as helping another robot water a tree.

The researchers use their model to specify what the robot’s physical goals are, what its social goals are, and how much emphasis it should place on each other. The robot is rewarded for actions that bring it closer to accomplishing its goals. If a robot is trying to help its partner, it adjusts its reward to match that of the other robot; If he is trying to hinder, he adjusts its reward to be the opposite. The planner, an algorithm that decides what tasks a robot should perform, uses this continuously updated reward to guide the robot to accomplish a mix of physical and social goals.

“We’ve opened up a new mathematical framework for how you model the social interaction between two agents. If you’re a robot, and you want to go to location X, and I’m another robot and I see that you’re trying to get to location X, I can assist by helping you get to location X faster. This may mean moving X closer to you, finding another better X, or Doing whatever action you want on X. Our formulation allows the scheme to discover the ‘how’; we specify the ‘what’ of what social interactions mean mathematically,” Tejwani says.

It’s important to blend a robot’s physical and social goals to create realistic interactions, because the humans who help each other have limits on how far they will go. For example, a rational person probably wouldn’t give their wallet to a stranger, says Barbu.

The researchers used this mathematical framework to define three types of robots. A level 0 robot has only physical goals and cannot reason socially. A level 1 robot has physical and social goals but assumes that all other robots have only physical goals. Level 1 robots can take actions based on other robots’ physical goals, such as helping and hindering. A Level 2 robot assumes that other robots have social and physical goals; These robots can perform more sophisticated actions such as joining together to help.

model evaluation

To see how their model compared to human perspectives about social interactions, they created 98 different scenarios with robots at levels 0, 1, and 2. Twelve humans watched 196 video clips of robots interacting, and were then asked to guess physical and social. The goals of those robots.

In most instances, his model agreed with what humans thought about the social interactions that took place in each frame.

“We have this long-term interest, both to build computational models for robots, but also to dig deeper into the human aspects of it. We want to find out what characteristics humans use to understand social interactions from these videos.” Can we objectively test your ability to recognize social interactions? Maybe there is some way to teach people to recognize these social interactions and improve their abilities. are distant, but even being able to measure social interactions effectively is a big step forward,” says Barbu.

towards greater sophistication

Researchers are working on developing a system with 3D agents in an environment that allows for many more types of interactions, such as manipulation of household objects. They also plan to modify their model to include environments where operations can fail.

The researchers also want to incorporate a neural network-based robot planner into the model, which learns from experience and performs rapidly. Ultimately, they hope to run an experiment to collect data about characteristics that humans use to determine whether two robots engage in social interaction.

“Hopefully, we’ll have a benchmark that allows all researchers to work on these social interactions and inspire the kinds of science and engineering advances we’ve seen in other areas, such as object and action recognition,” says Barbu. Huh.

“I think it’s a lovely application of structured logic to a complex but urgent challenge,” says Tomar Ullman, assistant professor in the Department of Psychology at Harvard University and head of the Computation, Cognition and Development Laboratory, who was not involved in it. Research. “Even young infants understand social interactions like helping and hindering, but we don’t yet have machines that can make this logic at anything like human-scale flexibility. Mine believes that the model proposed in this work, whose agents think about the rewards of others and socially plan how to fail or support them, is a good step in the right direction.

This research was supported by the Center for Brains, Minds and Machines; National Science Foundation; MIT CSAIL Systems Joe Learn Initiative; MIT-IBM Watson AI Lab; DARPA Artificial Social Intelligence for Successful Teams Program; US Air Force Research Laboratory; US Air Force artificial intelligence accelerator; and the Office of Naval Research.

Read More

The Diablo Canyon nuclear plant in California, which is still operating in the state, is set to close in 2025. a team of researchers MIT’s Center for Advanced Nuclear Energy Systems, the Abdul Latif Jameel Water and Food Systems Lab, and the Center for Energy and Environmental Policy Research; Stanford’s Precourt Energy Institute, And energy analysis firm Lucid Catalyst LLC has analyzed the potential benefits the plant could provide if its operations were extended to 2030 or 2045.

They found that this nuclear plant could simultaneously help stabilize the state’s electric grid, provide desalinated water to meet the state’s chronic water shortage, and provide carbon-free hydrogen fuel for transportation. Is. MIT News The group asked the report’s co-authors Jacopo Buongiorno, TEPCO Professor of Nuclear Science and Engineering, and John Linhard, the Jamil Professor of Water and Food, to discuss the group’s findings.

Why: Your report suggests co-located a major desalination plant with the existing Diablo Canyon Power Plant. What would be the potential benefits from operating the desalination plant in conjunction with the power plant?

lienhard: The cost of desalinated water produced at Diablo Canyon will be lower than at a stand-alone plant because the cost of electricity will be significantly lower and you can take advantage of the existing infrastructure for seawater intake and saltwater outflow. Electricity will be affordable as the location takes advantage of Diablo Canyon’s unique ability to provide low-cost, zero-carbon baseload electricity.

Depending on the scale at which the desalination plant is built, you can have a very significant impact on the water scarcity of state and federal projects in the area. In fact, one of the numbers that came out of this study was that an intermediate-sized desalination plant there would produce more freshwater than the highest estimate of net yield from the proposed Delta Conveyance Project on the Sacramento River. You can get the amount of water at Diablo Canyon for less than half the investment cost, and without the associated impacts that come with the Delta Conveyance Project.

And here the envisaged technology for desalination, reverse osmosis, is available off the shelf. You can buy this tool today. In fact, it is already in use in California and thousands of other places around the world.

Why: You report discusses three potential products from the Diablo Canyon plant: desalinated water, electricity for the grid, and clean hydrogen. How well can the plant accommodate all those efforts, and are there advantages to combining them as opposed to doing one of them separately?

Good morning: California, like many other regions of the world, faces many challenges as it seeks to reduce carbon emissions on a large scale. First, the widespread deployment of intermittent power sources such as solar and wind creates too much variability on the grid that can be balanced by dispatchable firm power generators such as Diablo. Therefore, Diablo’s first mission is to continue to provide reliable, clean electricity to the grid.

The second challenge is prolonged drought and water scarcity for the state in general. And one way to address this is through water desalination that is co-located with the nuclear plant at the Diablo site, as John explained.

The third challenge relates to the decarbonisation of the transport sector. One possible approach is to replace conventional cars and trucks with vehicles powered by fuel cells that consume hydrogen. Hydrogen is to be produced from the primary energy source. Nuclear power can do this quite efficiently and in a carbon-free manner, through a process called electrolysis.

Our economic analysis found many of these products – electricity for the grid, hydrogen for the transportation sector, water for farmers or other local users – as well as the costs associated with deploying new facilities needed to produce desalinated water. Taking into account the expected revenue. and hydrogen. We found that, if Diablo’s operating license was extended through 2035, it would cut carbon emissions by an average of 7 million metric tons per year – a reduction of more than 11 percent from 2017 levels – and power system costs. Savings would be at the rate of $2.6 billion.

Delaying Diablo’s retirement further to 2045 would save 90,000 acres of land that would need to be dedicated to renewable energy generation to replace the facility’s capacity, and it would rate savings of up to $21 billion in electricity system costs. Will do

Finally, if Diablo were to operate as a polygeneration facility that simultaneously provides electricity, desalinated water and hydrogen, its value, in terms of volume per unit of electricity generated, could increase by as much as 50 percent.

Leonhard: Most of the desalination scenarios that we considered did not consume the full power output of that plant, which means that in most scenarios you would continue to generate electricity and do something with it, beyond just desalination. I think it’s also important to remember that this power plant produces 15 percent of California’s carbon-free electricity today and is responsible for 8 percent of the state’s total electricity generation. In other words, Diablo Canyon is a huge factor in the decarbonization of California. When or if this plant goes offline, reliance on natural gas for electricity generation is likely to increase in the near future, meaning increased California’s carbon emissions.

Why: This plant in particular has been highly controversial since its inception. What is your assessment of the safety of the plant after its scheduled shutdown, and how do you see this report as a contribution to decision making about that shutdown?

Good morning: The Diablo Canyon Nuclear Power Plant has a very strong safety record. The potential safety concern for the Diablo is closely related to its many fault lines. Being located in California, the plant was initially designed to withstand large earthquakes. Following the Fukushima accident in 2011, the Nuclear Regulatory Commission reviewed the plant’s ability to withstand external events of exceptionally rare and severe magnitude (eg, earthquakes, tsunamis, floods, tornadoes, wildfires, hurricanes). After nine years of assessment, the NRC concludes that “existing seismic potential or effective flood protection” [at Diablo Canyon] Unlimited recombination will address the hazards.” That is, Diablo was designed and built to withstand the rarest and strongest earthquakes that are physically possible at this site.

As an added level of safety, the plant has been retrofitted with specialized equipment and procedures to ensure reliable cooling of the reactor core and spent fuel pool under a hypothetical scenario that includes all design-based safety systems. has been disabled by a serious external incident. ,

Leonhard: With regard to the potential impact of this report, PG&E [the California utility] The decision to close the plant has already been made, and we and others hope that the decision will be reconsidered and reversed. We believe this report will give relevant stakeholders and policy makers a lot of information about the options and value involved in keeping the plant running, and how California can benefit from the clean water and clean electricity generated at Diablo Canyon. gives information. It’s not up to us to decide, of course – it’s a decision that the people of California must make. We can only provide information.

Why: What are the biggest challenges or obstacles to seeing these ideas implemented?

Leonhard: California has very strict environmental protection rules, and it’s good that they do. One area of ​​great concern for California is the health of the ocean and the protection of coastal ecosystems. As a result, to protect marine life, there are very strict regulations regarding the intake and outflow of both power plants and desalination plants. Our analysis shows that this combined plant can be implemented within the parameters set by the California Ocean Plan and can meet regulatory requirements.

We believe that a thorough analysis will be required before proceeding. You have to do a site study and really get into the water and see in detail what’s out there. But the initial analysis is positive. The second challenge is that discussions about nuclear power in California have generally not been very supportive, and similarly some groups in California oppose desalination. We expect both of those perspectives to be part of the conversation about whether or not to proceed with this project.

Why: How special is this analysis to the uniqueness of this place? Are there aspects of this that apply to other nuclear plants domestically or globally?

Leonhard: Hundreds of nuclear plants around the world are located along the coast, and many are in water-scarce areas. Although our analysis focused on Diablo Canyon, we believe that the general findings apply to many other seaside nuclear plants, so that this approach and these findings could potentially be applied to hundreds of sites worldwide.

Read More

Half the population lives with a monthly ovarian hormone cycle. Those cycles affect menstrual patterns, fertility, and more, but the stigma surrounding hormone problems has limited awareness of hormone health.

Now, Avia is working to help people understand their hormone cycle and its effects.

“These chakras affect sleep quality, muscle toning quality, energy, sex drive, skin health, mental health, energy levels – you name it – but no one is talking about that, CEO Agniya Mathur says MBA ’18. “We see a world where people can use their hormone cycles to benefit from day to day — to make them a superpower rather than something they’re afraid of or feeling.”

The startup, which was conceived during the MIT Entrepreneurship and Maker Skills Integrator (MEMSI), achieves through a combination of education, community and technology.

Avia’s flagship product is a patented Smart Pill Case that can sense when users take birth control pills, and remind them via a mobile app if they forget. In addition to sending these notifications, the app and its accompanying website allow users to track the changes they see throughout their cycle, receive personalized recommendations, learn from peers and medical experts, and connect with a community dealing with similar problems. allows.

,[Raising awareness about hormonal health] It’s not something that can happen overnight, so we decided to address a problem that people already understand they have, which is remembering to take their birth control pill,” Mathur says. From there, we’ve expanded our services based on what we’ve learned and what’s not working for our users.”

Mathur, along with Avia co-founders Alexis Wong and Aya Suzuki ’18, say they are inspired by the stories they hear from people who have used Avia’s services to address problems like anxiety and acne. which they have been battling for years. Without knowing that they were related to hormones.

,[We’re] Helping people on their health journey better than their mothers,” says Mathur. “Hopefully my future daughter’s health journey will be better than mine. No one is paying attention to this problem, but half the population has ovaries, so it is something that is extremely rare.”

an idea has been created

Wong and Suzuki met at MEMSI, an intensive two-week bootcamp that challenges student participants from MIT and Hong Kong to build hardware startups. Suzuki worked in a rehabilitation facility and observed problems people had in adhering to treatment. They began developing a pill pack that could sense when pills were still in their tinfoil packaging and send reminders to users via smartphone. He was later introduced to Mathur through a mutual friend, who had also participated in MEMSI.

The founders talked to hundreds of people with various health problems to determine where they could make the biggest difference. The trio made for a diverse founding team: Mathur had studied neuroscience as an undergraduate and thought she was going to be a doctor until she got into counseling and an MBA at MIT went the better way. . Wong was studying electrical engineering at the University of Hong Kong, and Suzuki was studying mechanical engineering and design at MIT.

Mathur, who used to wake up in the middle of the night to write down ideas and questions about business, was thrilled to be in an interdisciplinary environment during her MBA program.

“Sloan was one of the only schools that ‘One MIT’ says, ‘We’re business schools and we have everything we need,'” says Mathur. “I thought it was really cool because then you could be in the club with people from other regions. Multidisciplinary teams are extremely important to have the kind of impact we want.”

The founders say that being a student was helpful when they started building the company. He received support from the MIT Sandbox, the MIT Venture Mentoring Service, and went through startup accelerators MIT Delta V and MIT Fuse. He also won the audience choice award during the MIT $100K Entrepreneurship Competition.

“Starting a company at MIT is amazing because you have so many resources, both financial and educational,” says Mathur.

Today the founders continue to draw value from MIT’s network, meeting with former classmates and alumni – Mathur also refers to class notes taken as an MBA student from time to time.

A new approach to hormone health

After surveying thousands of people, the founders learned that people want an app that goes beyond tracking duration or mood that actually gives users health and behavioral suggestions.

“We are helping you understand your hormone cycle through our own reporting, but the great thing is that we also give you actionable information,” says Mathur. “For example, these are the three days where you have the highest energy and here’s how you can take advantage of it, or these are the four days when you worry the most, here’s what you can do to help reduce it.” Or, it’s coming, so there are steps you can take to make sure it’s not as bad as it used to be.”

Avia, who has an advisory board of MDs, devotes tons of resources to educational efforts, creating blog posts and videos, hosting events, having an in-app forum with doctors four days a week, and engaging on social media . In community forums, users can ask questions, share stories or fears, and offer support. Avia is also grouping together members with similar experiences, such as those taking the same acne medication or dealing with a similar health journey.

“The more people we can get in front of, the more they can tell other people and help each other,” says Mathur.

The results have been promising. A member of Aavia was undergoing treatment for clinical depression when she started tracking her mood on the app. The user brought that data to their doctor, who felt that the depressive symptoms were much more severe during the specific time period. The data helped the doctor change her diagnosis to premenstrual dysphoric disorder, which is treated differently from clinical depression.

“The stories we hear really make me get out of bed in the morning,” says Mathur. “To see if we changed this person’s perspective on something, we helped this person understand that they really needed a different treatment, or this change in confidence or stress by knowing this person Those are really our two biggest success metrics: reducing stress and building confidence. That’s where we’re seeing significant changes.”

Most of Avia’s members are 18 to 24 years old, a demographic Mathur says is more open to talking about hormone problems. For the founders, it all goes back to Avia’s mission to set a new paradigm for hormone health.

“We hear most consistently from people who feel they are neglected or who are dealing with a problem that no one else is paying attention to,” says Mathur. “Our members tell us that they don’t necessarily trust the big health care companies, but they do trust us. We are focused on what we can do for a lasting impact as users use their hormones. Health travels go through.”

Read More