Artificial audio intelligence: making the home truly smart

Author : Dr Chris Mitchell, Founder and CEO of Audio Analytic

05 December 2017

Credit: Shutterstock

This article explores how artificial audio intelligence technology is enabling smart home (and beyond) applications, making the devices around us more human-like and useful – like a Shazam for everyday, real-world sounds.

This piece originally appeared in the December 2017 issue of Electronic Product Design & Test. To view the digital edition, click here – and to register to receive your own printed copy, click here.

Consumer tech giants are in a race to make digital assistants and our homes smarter. Google, Amazon, Apple, Nest, Hive, Vivint, Philips and Ring lead this rapidly evolving sector, but many more are expected to compete for space in our homes.

Through ‘smart speakers’, Google, Amazon and Apple combine artificial intelligence, connectivity and speech recognition to deliver information, entertainment and other services – and can even manage other devices within the home. Other companies provide a wide range of technology such as thermostats, cameras and doorbells, which can work with these digital assistants and independently. But with the exception of a new generation of cameras that offer motion sensors and facial recognition, in most cases these devices are connected, rather than truly smart.

No trigger word, no response

Voice technology has been with us for many years, and with advances in AI that have improved its accuracy, it is now becoming a viable, natural user interface. But while voice-controlled devices can be remotely operated by voice communication or smartphones, they cannot assess and respond to a situation on their own.

Important events in the home, like a window breaking, a smoke alarm sounding, a baby crying or dog barking will all be ignored by the current crop of home assistants. Without the appropriate trigger word being used, they wait for instruction from us.

For a device to be able to react intelligently to events around it, it needs to assess its environment through human-like senses to gain context-awareness. Hearing is one of the most powerful of human senses, as it is not limited by line of sight – and sounds carry many nuances. Audio Analytic is teaching technology to hear, so that it can gather that additional layer of context, enabling it to correctly evaluate and respond to audio events on its own.

How do we give devices a human-like sense of hearing?

Our software platform, ai3™, contains a library of sound profiles which are all tuned to identify particular sounds. We had a zero data problem to start with, because while speech 

recognition is now a mature technology and the use/structure of phonemes is well understood, sound recognition proved to be a much more complex task.

To assemble this library, therefore, we had to record thousands of sounds ourselves, in order to collate real-world data to which we could expose our machine learning technology. For example, to collect enough data to teach our technology to reliably recognise the sound of a window breaking, we had to smash hundreds of windows, of many different types, with various implements, in our dedicated sound lab – as well as in people’s homes (with their permission, of course!).

Think about a window being smashed, and all the different ways glass shards can hit the floor randomly, without any particular intent or style. Capturing the diversity of sounds and their sources is more important than the number of audio recordings; this is to ensure that the end device will be able to reliably recognise a given sound when it hears it again in a different scenario.

The data collected through these recordings are labelled, organised and analysed in a way that has never been done before. Once you understand the structure of sound in this way, you can move on to the machine learning process.

The data collected is supplied to our proprietary machine listening framework, which extracts hundreds of ideophonic features from sounds. Through encoding, decoding and introspection, the technology is able to analyse and describe sound ‘profiles’ based on the understanding of these ideophonics. These individual profiles are then embedded into ai3™, which can be integrated into virtually any consumer device equipped with a microphone.

Devices in which ai3™ is embedded are therefore able to recognise a range of important sounds, as well as scenes that one would expect to find in the home. This enables smart home technology to detect and interpret these everyday sounds – making the home more responsive, even when residents aren’t around.

Sound recognition technology in daily life

Right now, our customers are focused on the smart home market, because this is where there is currently both significant industry investment and demand from consumers. As consumer tech companies are under pressure to offer ever more compelling and differentiated products to consumers, embedded artificial audio intelligence represents a huge opportunity for competitive advantage and additional revenue opportunities. A number of recent announcements and press reports have already revealed devices that will employ advanced audio alerts to enhance their functionality.

The trend for sound recognition is part of what Mark Weiser, widely considered to be the father of ‘ubiquitous computing’, called ‘calm technology’: technology “which informs but doesn’t demand our focus or attention”. By enabling devices to react to audio events independently from human interaction, our sound recognition technology can deliver a more seamless, smarter experience across a range of everyday devices.

For example, in the near future, a consumer’s sneezing pattern could be detected throughout the day via their personal assistant deployed in the home, in the car, on their headphones and via their mobile phone while out and about. If these devices communicate and the personal assistant understood that the user was prone to hay fever, the technology could turn on the HEPA filter at home before the consumer arrives, and the vehicle could suggest an alternative route home to collect appropriate medication – all based on this perceived allergic response.

The technology has applications that go far beyond the home. The hearables and wearables industry is an obvious target, with sound recognition helping to improve user experiences. The automotive industry is another example that could benefit from sound recognition. We expect to see the integration of digital assistants in cars, and car manufacturers who want to make their vehicles more context-aware for safety purposes – such as hearing emergency vehicle sirens so that appropriate action can be taken, or adjusting Driver Assist if the driver is distracted.

Edge-based intelligence has obvious advantages for the above applications where connectivity may be an issue. ai3™ has been designed to run locally on devices: it is not a cloud-based solution. As a result, devices do not need to stream audio to the internet for analysis, meaning that no permanent connection is required, and power budgets and privacy concerns can be reduced.

Devices can also readily perform other functions while listening out for specific sounds, enabling this technology to fulfil its role in any situation where it might be useful to respond to the changing audio environment.

In the near future, sound recognition will become commonplace in the same way that voice activation is now a familiar feature. Analyst firm, Ovum predicts that, by 2021, the number of smart homes will increase to 463 million globally. IHS Markit lends further support to this trend: they predict that by the same year 60% of all smart home devices will be integrated or embedded with voice control/assistants. This means the necessary infrastructure of microphones and processors for audio event detection will already exist, so giving products ‘ears’ – differentiating them by making them smarter – is an opportunity not to be missed.

Contact Details and Archive...

Print this page | E-mail this page