Source: NVIDIA blog post. The Otter team at AISense. |
US-based AISense has launched Otter, a GPU-powered app that records speech and produces both voice files and transcriptions from multiple people. Most of the software available today must be trained to recognise the voice of their owner, and will not recognise speech accurately from multiple people.
Human-to-human interactions are much more difficult to capture than human-to-machine interactions such as simple commands between people and Amazon’s Alexa, Apple’s Siri or Google Assistant, AISense co-founder and CEO Sam Liang points out in a blog post. While they typically handle short queries or commands from a single speaker, Otter can manage longer conversations involving multiple people.
AISense’s proprietary Ambient Voice Intelligence technology allows people to store, search, share and analyse voice conversations. Powered by Ambient Voice Intelligence, Otter allows users to scroll through text, labeled as coming from multiple people, and gives the option to listen, as well. The app provides better than 90% accuracy in text dictation, says the company.
Otter has to handle the complicated interactions of people and nuances of conversations, and it can get tripped up by accents in people’s speech, said Liang. “This is a pretty deep technology. It’s extremely difficult,” he said. “We had to do pretty sophisticated supervised learning, and we had to get a lot of labelled data, with hundreds of thousands of hours of recordings.”
The 15-person team at AISense trained Otter on 50 NVIDIA Tesla GPUs and terabytes of freely-available audio and transcripts from the archives of National Public Radio (NPR) radio programmes and Supreme Court proceedings available at the US Library of Congress. This means that Otter will work best with US accents.
Said Liang, “It’s a startup and we have to spend money very frugally. But we have to spend some resources on GPUs — it’s just a must.”
The company is targeting Otter at enterprise customers who might use it in meetings. AISense plans to release a premium version that will require a subscription, and it already licenses some of its technology to enterprise customers.
AISense is also working with partners to deliver similar results. It recently partnered with Zoom Video Communications to handle voice data, which is being automatically transcribed with AISense’s technology.
AISense is also working with partners to deliver similar results. It recently partnered with Zoom Video Communications to handle voice data, which is being automatically transcribed with AISense’s technology.
Details: