IRL-Subtitles

Using techniques similar to human perception to localize sounds and speakers.

Project Details

The final goal of the project is to make AR-Glasses that can display text taken from microphones to be displayed on a heads up display to the user. The system integrates a microphone array, and a camera to localize sound, transcribe speech, and visually present captions on a laptop screen connected to the glasses. This project would ideally be combined with a smart glass display/platform in the future.

Current State

The current state of the project is that there is no processing that is happening locally on the board, all the data is being streamed over to a computer to be processed and displayed. There are one ESP32-S3 wroom 1U that are being used to get audio and one ESP32-S3 eye that is acting as an AP, camera, and handling the data routing. This is a battery powered data colletion unit.

Project Structure

The project consists of:

Firmware
Software
Hardware
Extra Documentation

Each is structured into their own folders, and sub groups.

1. Firmware

This is the code that is running on the ESP32. Most of the code here is for data collection and transmission. This code has two main parts, one that will go on the arm board and one that goes on the eye. This was made, built, and flashed using vscode.

Requirements

ESP32 IDF
Target set to ESP32-S3

2. Software

This is the code that will be run on the computer to process the data that is streamed. Due to time constraints the integration of the audio localization was not included in the OpenCV and speech to text display.

3. Hardware

This has both the PBC and KiCad files and the glasses cad models. The main components are:

Arm boards - Holds the power management and mics that collect teh audio data.
ESP32S3 eye - Collects the visual data and acts as an access point to route the data.
PowerBoost 1000 Charger this handle the battery.
Frames - Holds all components in a glasses form to be a wearable design.

KiCad

This has the KiCad models that were to produce the boards. That includes the schematics, the PCB files, and gerber files to order the glasses.

Onshape

Onshape was used to the frames that the PCBs will fit in on the glasses.

4. Extra Documentation

This is all the documentation that was created in the testing and researching on creating the glasses. This also has the pin outs that were used on the glasses for reference.

What the Project could use

Faster processing, moving computation on the the board.
Better integration between programs, face detect, sound location, and speech to text.
Localization of sound implemented.
Battery management within code.
Redesign/flip the ports for the eye adapter board be able to connect to eye.

Helpful Links

https://github.com/espressif/esp-who/tree/master

References

ESP-idf examples softAP

ESP-idf examples station

espressif esp-who ESP32-S3-EYE

GitHub Doc

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
Extra Documentation		Extra Documentation
Firmware		Firmware
Hardware		Hardware
Media		Media
Software		Software
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IRL-Subtitles

Project Details

Current State

Project Structure

1. Firmware

Requirements

2. Software

3. Hardware

KiCad

Onshape

4. Extra Documentation

What the Project could use

Helpful Links

References

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

leeyumsk/IRL-Subtitles

Folders and files

Latest commit

History

Repository files navigation

IRL-Subtitles

Project Details

Current State

Project Structure

1. Firmware

Requirements

2. Software

3. Hardware

KiCad

Onshape

4. Extra Documentation

What the Project could use

Helpful Links

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages