Hey folks. We built SAA (Selective Auditory Attention) after trying to find ways to make a good experience with multiple robots/multiple agents. What typically ended up happening is they'd never stop talking.
This is an SDK you can put before your STT. It lets you know when your device is being spoken to or not without a wakeword. You can use it for: -Single AI, Multi human -Multi AI, Single human -Multi AI, Multi human (we recommend also adding a wakeword on top for a better system)
There are two models. One that is video + audio and one that is just audio. The way it overall works is that it looks for shifts in attention patterns (body language changes, vocal patterns) to work. It's a tough problem to nail as every human being is different in how they interact with people/devices.
Let me know how it is!