Franke A600 coffee machine with PicoVoice

Coffee By Command: The Speech2Touch Voice Hack

If you were to troll your colleagues, you can label your office coffee maker any day with a sticker that says ‘voice activated’. Now [edholmes2232] made it actually come true. With Speech2Touch, he grafts voice control onto a Franke A600 coffee machine using an STM32WB55 USB dongle and some clever firmware hacking.

The office coffee machine has been a suspect for hacking for years and years. Nearly 35 years ago, at Cambridge University, a webcam served a live view of the office coffee pot. It made sure nobody made the trip to the coffee pot for nothing. The funny, but in fact useless HTTP status 418 was brought to life to state that the addressed server using the protocol was in fact a teapot, in answer to its refusal to brew coffee. Enter this hack – that could help you to coffee by shouting from your desk – if only your arms were long enough to hold your coffee cup in place.

Back to the details. The machine itself doesn’t support USB keyboards, but does accept a USB mouse, most likely as a last resort in case the touchscreen becomes irresponsive. That loophole is enough: by emulating touchscreen HID packets instead of mouse movement, the hack avoids clunky cursors and delivers a slick ‘sci-fi’ experience. The STM32 listens through an INMP441 MEMS mic, hands speech recognition to Picovoice, and then translates voice commands straight into touch inputs. Next, simply speaking to it taps the buttons for you.

It’s a neat example of sidestepping SDK lock-in. No reverse-engineering of the machine’s firmware, no shady soldering inside. Instead, it’s USB-level mischief, modular enough that the same trick could power voice control on other touchscreen-only appliances.

Researchers Create A Brain Implant For Near-Real-Time Speech Synthesis

Brain-to-speech interfaces have been promising to help paralyzed individuals communicate for years. Unfortunately, many systems have had significant latency that has left them lacking somewhat in the practicality stakes.

A team of researchers across UC Berkeley and UC San Francisco has been working on the problem and made significant strides forward in capability. A new system developed by the team offers near-real-time speech—capturing brain signals and synthesizing intelligible audio faster than ever before.

Continue reading “Researchers Create A Brain Implant For Near-Real-Time Speech Synthesis”

Hacked teddybear on a desk

Turning GLaDOS Into Ted: A Tale Of A Talking Toy

What if your old, neglected toys could come to life — with a bit of sass? That’s exactly what [Binh] achieved when he transformed his sister’s worn-out teddy bear into ‘Ted’, an interactive talking plush with a personality of its own. This project, which combines the GLaDOS Personality Core project from the Portal series with clever microcontroller tinkering, brings a whole new personality to a childhood favorite.

[Binh] started with the basics: a teddy bear already equipped with buttons and speakers, which he overhauled with an ESP32 microcontroller. The bear’s personality originated from GLaDOS, but was rewritten by [Binh] to fit a cheeky, teddy-bear tone. With a few tweaks in the Python-based fork, [Binh] created threads to handle touch-based interaction. For example, the ESP32 detects where the bear is touched and sends this input to a modified neural network, which then generates a response. The bear can, for instance, call you out for holding his paw for too long or sarcastically plead for mercy. I hear you say ‘but that bear Ted could do a lot more!’ Well — maybe, all this is just what an innocent bear with a personality should be capable of.

Instead, let us imagine future iterations featuring capacitive touch sensors or accelerometers to detect movement. The project is simple, but showcases the potential for intelligent plush toys. It might raise some questions, too.

Continue reading “Turning GLaDOS Into Ted: A Tale Of A Talking Toy”

Hypersonic Speech Jammer Works At A Distance

Speech jammers were a meme a little while back. By feeding back delayed voice audio to a person’s ears, it makes it near-impossible for most people to speak, as our speech system runs on a continual feedback loop. [Benn Jordan] decided to try reworking that concept by replacing headphones with a directed sound projector.

The key to the project is the use of hypersonic sound arrays. These essentially use high-frequency sound beyond the human range of hearing to carry a lower-frequency sound signal. By essentially modulating this higher-frequency carrier to create the perception of lower-frequency sound, it’s possible to create an audible signal that is highly directional. It’s like a “sound laser” that can be pointed directly at a person to allow them to hear it, which is then inaudible when pointed slightly away.

These allow the delayed voice signal to be fired at a person’s head with a relatively narrow spatial spread. When an individual speaks into a microphone hooked up to the device, delayed audio is sent through the hypersonic array back to the speaker’s ears, garbling their speech as their brain gets confused by the feedback.

[Benn] demonstrated the device in public by offering random individuals $100 to read a paragraph out of a book. The speech jammer worked a treat, and [Benn] was able to keep his money… until one amazingly immune individual breezed through the test. Check out our prior coverage of speech jamming technology. Video after the break.

Continue reading “Hypersonic Speech Jammer Works At A Distance”

New Wearable Detects Imminent Vocal Fatigue

“The show must go on,” so they say. These days, whether you’re an opera singer, a teacher, or just someone with a lot of video meetings, you rely on your voice to work. But what if your voice is under threat? Work it too hard, or for too long, and you might find that it suddenly lets you down.

Researchers from Northwestern University have developed a new technology to protect against this happenstance. It’s the first wearable device that monitors vocal usage and calls for time out before damage occurs. The research has been published in the Proceedings of the National Academy of Sciences.

Continue reading “New Wearable Detects Imminent Vocal Fatigue”

Voice Without Sound

Voice recognition is becoming more and more common, but anyone who’s ever used a smart device can attest that they aren’t exactly fool-proof. They can activate seemingly at random, don’t activate when called or, most annoyingly, completely fail to understand the voice commands. Thankfully, researchers from the University of Tokyo are looking to improve the performance of devices like these by attempting to use them without any spoken voice at all.

The project is called SottoVoce and uses an ultrasound imaging probe placed under the user’s jaw to detect internal movements in the speaker’s larynx. The imaging generated from the probe is fed into a series of neural networks, trained with hundreds of speech patterns from the researchers themselves. The neural networks then piece together the likely sounds being made and generate an audio waveform which is played to an unmodified Alexa device. Obviously a few improvements would need to be made to the ultrasonic imaging device to make this usable in real-world situations, but it is interesting from a research perspective nonetheless.

The research paper with all the details is also available (PDF warning). It’s an intriguing approach to improving the performance or quality of voice especially in situations where the voice may be muffled, non-existent, or overlaid with a lot of background noise. Machine learning like this seems to be one of the more powerful tools for improving speech recognition, as we saw with this robot that can walk across town and order food for you using voice commands only.

Continue reading “Voice Without Sound”

Classic 80s Text-To-Speech On Classic 80s Hardware

Those of us who were around in the late 70s and into the 80s might remember the Speak & Spell, a children’s toy with a remarkable text-to-speech synthesizer. While it sounds dated by today’s standards, it was revolutionary for the time and was riding a wave of text-to-speech functionality that was starting to arrive to various computers of the era. While a lot of them used dedicated hardware to perform the speech synthesis, some computers were powerful enough to do this in software, but others were not quite able. The VIC-20 was one of the latter, but thanks to an ESP8266 it has been retroactively given this function.

This project comes to us from [Jan Derogee], a connoisseur of this retrocomputer, and builds on the work by [Earle F. Philhower] who ported the retro speech synthesis software known as SAM from assembly to C which made it possible to run on the ESP8266. Audio playback is handled on the I2S port, but some work needed to be done to get this to work smoothly since this port also handles the communication with the VIC-20. Once this was sorted out, a patch was made to be able to hear the computer’s audio as well as the speech synthesizer’s. Finally, a serial command interface was designed by [Jan] which allows for control of the module.

While not many of us have VIC-20s sitting at home, it’s still an interesting project that shows the broad scope of a small and inexpensive chip like the ESP8266 which would have had a hefty price tag back in the 1980s. If you have other 80s hardware laying around waiting to be put to work, though, take a look at this project which brings new vocabulary words to that old classic Speak & Spell.

Continue reading “Classic 80s Text-To-Speech On Classic 80s Hardware”