
Ruidong Zhang, a doctoral student in informatics, demonstrates the Speechin silent speech recognition device. credit: Ryan Young / Cornell University
Speech recognition technology allows us to ask Siri to check the weather for tomorrow or to ask Alexa to play our favorite song.
But those techniques require audible speech. What if a person cannot speak, or if vocal speech is not appropriate in a particular setting?
Cheng Zhang, assistant professor of informatics in the Cornell N. S. Bowers College of Computing and Information Science, and doctoral student Ruidong Zhang have an answer: Speechchain, a silent-speech recognition (SSR) tool that can identify silent commands using images. Skin deformities in the neck and face captured by neck-mounted infrared (IR) cameras.
The technology is detailed in “Speechine: A Smart Necklace for Silent Speech Recognition,” which was published on December 31. Proceedings of the Association of Computing Machinery on Interactive, Mobile, Wearable and Ubiquitous Technologies,
Ruidong Zhang is also presenting the paper at the Ubiquitous Computing (UbiComp 2022) conference in October.
“There are two questions: first, why the defeat? And second, why the silent speech?” Zhang said. “We think necklaces are a factor that people are used to, as opposed to ear-mounted devices, which may not be as comfortable. As far as silent speech, people might think, ‘I’ve had the first one. I have a speech recognition device since the phone.’ But you need to vocalize the sound for those people, and it may not always be socially appropriate, or the person may not be able to articulate the speech.”
“This device has the ability to learn speech patterns of a person even with silent speech,” he said.
“We are introducing a completely new form factor, new hardware to this area,” said Ruidong Zhang, who built the original prototype at his home in China while remotely completing the first year of his doctoral program in 2020 .
The device is similar in appearance to the Neckface, a technology unveiled last year by Cheng Zhang and members of his SciFi Lab team. The Neckface continuously tracks facial expressions by using infrared cameras to capture images of the chin and face from the neck down.
Like the neckface, the Speechin houses an IR camera mounted on a 3D printed necklace case, which is hung on a silver chain and the camera is pointed at the wearer’s chin. For increased stability, the developers designed a pawn on each side, and placed a coin at the bottom.
Convenience and privacy, Cheng Zhang said, are two reasons why a necklace-mounted IR camera may be preferable to a traditional front facing camera. “A camera in front of your face is taking pictures of what’s behind you,” he said, “and that raises privacy concerns.”
For their initial experiment, which involved 20 participants (10 speaking English, 10 Mandarin Chinese), measurements were taken to determine the baseline position of the chin, then differences to train the device to recognize simple commands. images were used.
Ruidong Zhang had participants pronounce 54 commands in English, including numerals, interactive commands, voice assistant commands, punctuation, and navigation commands. Then he did the same with 44 simple Mandarin words or phrases.
Speechin recognized commands in English and Mandarin with an average accuracy of 90.5% and 91.6%, respectively. To further test its limits, researchers conducted another study with 10 participants, all of whom silently uttered a specially designed list of 72 one-syllable “non-words,” consisting of 18 consonants and four A combination of vowels was used.
Finally, the researchers recruited six participants to pronounce 10 Mandarin and 10 English phrases while walking. Because of the variation in walking styles (eg low versus low head movement) between participants, the success rate in this study was low.
The project demonstrates the power of determination: Ruidong Zhang built a lab in his home, complete with a soldering station, and recruited people in his hometown as study participants.
“But because I live in a small town and it is difficult to find people who speak English,” he said, “we actually ended up recruiting English speakers at Zhejiang University in Hangzhou. It was an unforgettable experience for me. Was.”
Smart necklace can track your detailed facial expressions
Ruidong Zhang et al, Speechin, ACM’s Proceedings on Interactive, Mobile, Wearable and Ubiquitous Technologies (2021). DOI: 10.1145/3494987
Provided by Cornell University
Citation: Smart Necklace Recognizes ‘Silent’ English, Mandarin Command (2022, February 15) Retrieved on 30 March 2022 from https://techxplore.com/news/2022-02-smart-necklace-silent-english-mandarin.html was done.
This document is subject to copyright. No part may be reproduced without written permission, except for any fair use for the purpose of personal study or research. The content is provided for information purposes only.