Sound - The Next Frontier

Suryaprakash Konanuru, CTO, Ideaspring CapitalIdeaspring Capital is focused on investing in early stage technology product companies in India.

Speech is the most ancient form of communication humans know. Over time, it has continued to be the preferred medium of communication, or at least the most efficient one, for a majority of the population. Even today, speech is the easiest, most natural and most efficient form of communication humans knows. As people look for more natural ways to interact with machines, Voice is going to be the next big step in UX.

In addition to communication, humans have used sounds for a range of diagnostic purposes as well. Hunters listen to and use sounds to track and lure prey. Doctors listen to your heartbeat and breathing to evaluate your health. Mechanics listen to your vehicle and say 'this doesn't sound right'.

There has been a lot of activity in the voice domain lately. In the recent years, we have seen a rapid rise in voice-activated interfaces and audio-based content like podcasts, audio books, Alexa briefings, etc. Voice-based human-machine interfaces (Voice assistants) are at the forefront of voice technology. Hands-free, in-car voice assistants and home assistants have seen great adoption in the consumer market, and will continue to dominate.

However, moving beyond just voice assistants, sound-based solutions are being built and increasingly deployed in retail, healthcare, IIoT and various other areas. Advances in microphone technology, digital signal processing, machine learning, deep learning and the like are fuelling the growth of these new innovations.

Sound is ubiquitous and everything generates sound, be it humans or machines. It characteristically carries more information (emotion, context, loudness, etc.) as compared to text and is easy to collect. This rich information is helpful in making better analysis, which wouldn't otherwise be possible.

Microphones trained to listen to and classify different sounds could reshape a number of industries. For example, a very practical use-case is the diagnosis of Tuberculosis based on sound of your cough. For context, it can currently only be diagnosed by a range of tests that are performed after the initial skin or blood test to confirm the presence of Tuberculosis-causing bacteria.

Here are some interesting use-cases that go beyond just voice and focus on 'sound' - and they are just the tip of the iceberg:
1. Industrial IIoT
Even though a lot of sensors and IoT platforms are available, deployment of IoT solutions within manufacturing industries is not frictionless for many reasons: Sensors need to be either connected or mounted on existing ma-chines, sometimes wired to the machines and the maintenance of these systems is not easy, particularly if ma-chines are moved around. Legacy machines may not have any mechanism to collect data. However, sound-based sensors would provide a simple non-touch and non-intrusive solution to these problems. This would enable industries to use solutions without making many drastic changes to their existing setup.

Microphones trained to listen to and classify different sounds could reshape a number of industries

2. Medical Diagnostics
Historically speaking, audible sound has had two primary uses in healthcare - stethoscopes and medical transcription. Stethoscopes are perhaps the most recognizable of all medical diagnostic devices used to listen to the heart, lungs, and even blood flow in blood vessels. However, this has always been a non-digital device. With the advent of electronic stethoscopes, sound quality has improved - resulting in better diagnosis - and they can also be recorded and stored for further analysis, consultations with other doctors and more importantly, to train interns, junior doctors and even machine learning models. This presents an opportunity to mine large data sets that can be accumulated over time, and train models that can then diagnose diseases early, and not necessarily with significant human involvement, as is currently required.

Medical transcription is also getting a tech over-haul - there are startups which are attempting to transcribe voice notes by doctors into text, in real time instead of having a team manually to do so.

3. Content Creation Tools
With the increase in the consumption of voice based con-tent, especially podcasts and audiobooks, there is a great demand for voice based content creation and editing tools. With available tools not being very user-friendly, there is a lot of scope for innovation in this area, for example, editing audio files using their textual representation.

With voice synthesis technology picking up, another use-case can be the creation of audio content with minimal effort from the speaker.

4. Sound-Based Ads Personalization
Listening to the sound around you gives out a lot of contextual information which can be used to provide a personalized ad experience. An interesting solution for advertisers is by serving relevant ads on your secondary device (for example, your mobile phone), based on the sounds generated by the content you are consuming on your TV or laptop.

5. Customer Call Analysis
With enterprises increasingly investing in keeping their customers happy, it is imperative to analyze their customer interactions in a more detailed and methodical approach instead of on a sampling basis. However, the amount of customer calls is simply too big for manual analysis.

Automating customer call analysis using transcription and then identifying the intent and mood of the customer would go a long way in improving customer happiness. Also, doing it in real time can enable customer support representatives to be guided during the call.

There are many challenges in developing solutions which are real time, highly accurate and more importantly, which ensure ‘privacy'. But with advances in technology, especially edge-computing, we are bound to see solutions which address all these concerns.