Transfer function measurements address test challenges for smart speakers
01 October 2019
Transfer function measurement simplifies the measurement of frequency response in smart speakers & other speech communication devices, especially those that are difficult to test using sinusoidal signals, explains Joe Begin, Director of Applications & Technical Support at audio test experts, Audio Precision.
This tutorial was originally featured in the October 2019 issue of EPDT magazine [read the digital issue]. Sign up to receive your own copy each month.
Transfer function analysis provides a means of measuring the frequency response of a device using any broadband signal. It can simplify the measurement of frequency response for smart speakers, smartphones or headset microphones, many of which incorporate digital signal processing (DSP) algorithms that require the use of speech signals.
In system analysis, transfer function characterises the response of a system to a time-varying stimulus. The system is stimulated with an input signal and the output signal response is measured.
Using the Fourier transform mathematical technique, it is easier to characterise a system in the frequency domain than in the time domain. Figure 1 illustrates how the Fourier transform (F) is used to transform signals and the system response from the time domain to the frequency domain.
The system response in the frequency domain, or the transfer function, is denoted by H(f). The transfer function (also known as frequency response function, or FRF) is complex, meaning it has both magnitude and phase. Its magnitude represents the system’s output per unit input (the gain of the system) as a function of frequency and its phase response represents the phase between the output and input as a function of frequency. Note that the transfer function can also be used to derive the impulse response, which represents the system’s response in the time domain.
Measuring system transfer functions
A transfer function analyser measures system transfer functions using the complex discrete Fourier transform (DFT). Typically, a broadband signal – noise, music, speech or continuously swept sines (chirps) – is generated and applied as a stimulus to the system input.
The stimulus signal from the generator is looped back and measured on one of the analyser input channels and the system output is acquired on a second input channel.
The analyser acquires the system input and output signals simultaneously with the same sample clock and calculates the system’s transfer function (magnitude and phase).
The APx500 transfer function measurement implementation is distinguished by the fact that, in many applications, the stimulus signal does not have to be acquired as one of the analyser input channels, because the analogue input and output systems are precisely calibrated. Both analogue and digital IO systems are time aligned so, in many cases, the signal loaded into the generator, or an audio file on disk, can be used as the input signal, freeing up one analyser input channel for measurement.
This is also beneficial for smart devices, tablets and media players that must be tested in an open-loop configuration. Open loop describes test scenarios where the audio signal is not generated, passed through the device under test (DUT) and analysed in one continuous operation. For some DUTs, including smart devices, open-loop testing is required, since either the known audio signal or the output signal may be a file, whether on the DUT or a server connected to the DUT.
How to derive a transfer function
To derive the transfer function, the cross spectrum Gxy estimates the relationship (including phase) between x(t) and y(t) (see Figure 1) as a function of frequency.
The auto spectrum, or power spectrum, Gxx or Gyy represents the power (level-squared) of x(t) or y(t) as a function of frequency. Gxx and Gyy have magnitude only, but no phase.
There are three calculation modes in the APx transfer function measurement: H1, H2, and Magnitude Only (see Equations).
In H1 mode, the transfer function is calculated as the cross spectrum between the output and input divided by the auto spectrum of the input. This minimises the effect of noise introduced at the system output.
In H2 mode, the transfer function is calculated as the auto spectrum of the output divided by the cross spectrum between the output and input. This minimises the effect of noise introduced at the system input.
For most audio test applications, H1 is more appropriate, because the input signal is from a precision signal generator and is likely to have a higher signal-to-noise ratio (SNR) than the DUT output signal.
In Magnitude Only mode, the transfer function magnitude is calculated as the magnitude of the output spectrum divided by the magnitude of the input spectrum. This mode is useful when the input and output signals have sample clocks that are not precisely synchronised (for instance, Bluetooth devices).
Using the H1 or H2 calculation mode, the coherence function (C2) is also calculated. This indicates the relationship, or coherence, between the output Y(f) and the input X(f) at each frequency. The value is between 0 (no relationship) and 1.0 (perfect relationship). Low coherence is often caused by a poor SNR.
Transfer function measurement in a loudspeaker
Graphs in Figure 3 show the coherence measurements, the phase response and the magnitude response of a loudspeaker in an acoustic test box (a single driver intended for speech measurements in the frequency range from 100 Hz to about 10 kHz). The reference was the voltage of a pink noise signal applied to the loudspeaker and the output signal was the sound pressure as measured at the test position in the box.
Note that the FRF magnitude has units of Pa/Vrms. This is the gain or sensitivity of the loudspeaker as a function of frequency.
Note that coherence falls significantly below 30 Hz. This could be due to low output from the driver, the presence of higher ambient noise levels or the need for longer measurement times.
It is also worth noting that there is little difference between H1 and H2 calculation results, except at very low frequencies and at 16 to 18 kHz, where the coherence is significantly less than 1.
Measurement example: smart speaker input
A smart speaker input system records commands on a server, where they are processed and interpreted. In a test environment, a .wav file is used to feed a command to the smart speaker. The recorded command is then retrieved from the server and used, along with the original command’s .wav file, to determine the frequency response using the transfer function measurement.
There is no standard for smart speakers, but ETSI TS 103 738, which covers speaker-phones, recommends that tests are conducted in an anechoic chamber, with the DUT positioned on a table 40 cm from the table edge. A mouth simulator at the table edge and 30 cm above the DUT’s microphone system generates the input signal to the DUT (Figure 4).
A measurement microphone, calibrated using a sound level calibrator, placed at the mouth reference point (MRP) calibrates the mouth simulator, so that the analogue input system displays results in dBSPL.
The mouth simulator’s frequency response should be flat ± 0.5 dB over the frequency range of interest.
An audio waveform editor can be used to prepend the smart speaker’s wake-up word, followed by a short period of silence, to the speech stimulus signal (Figure 5).
A resampled stimulus signal with a wake-up word to match the sample rate of the audio file is saved as a .wav file and loaded as a generator waveform to measure the overall RMS level of the signal. The RMS level at the MRP is measured with the reference microphone and the generator level is adjusted until the target sound pressure level (SPL) is measured (the ETSI standard for speaker-phones specifies -4.7dBPa).
After generating the signal and retrieving the smart speaker recording, the transfer function measurement can be conducted. Setting the overlap to 75% and adjusting the averages so that the acquisition length is close to the length of the file maximises the measurement of the resulting short file.
Figure 6 shows the spectra of the reference signal and DUT signal on the same graph, and Figure 7 shows the frequency response magnitude, which in this case is calculated as the ratio of the two measured spectra. The measured FRF magnitude becomes progressively noisier above 1 kHz because of slight sample-rate variances, but when it is smoothed, the waveform appears to be a good representation of the mean value.
When the frequency response of the same input system is measured using a logarithmically swept sine chirp with sweep lengths of 0.35, 1.0 and 4.0 seconds, the DUT’s frequency response appears quite different as the sweep length increases (Figure 8). This is probably caused by the DSP in the device recognising the chirp as a sinusoidal signal and attempting to block it. This highlights the importance of using a real-speech stimulus.
Smart speaker output system
The transfer function measurement uses a matching algorithm to time-align the output signal from the DUT with the reference waveform. This is useful because, if the smart speaker system recognises a command and has access to the requested song or reference audio file, it will respond with a confirmation message before playing the audio. The transfer function measurement allows the measurement to be started at any time before the response and the matching algorithm will be used to time-align the music signal with the reference waveform.
Prepending a short burst of a maximum length sequence (MLS) signal to the stimulus signal provides a waveform with properties that make it easy for the match algorithm to detect (.wav files containing short MLS signals at different sample rates are available at www.AP.com).
The test setup for the smart speaker’s output path is similar to that of the input path, but replaces the mouth simulator with a measurement microphone (Figure 9).
For this example, the reference audio file was uploaded to a location accessible to the smart speaker, and the smart speaker was directed to play the file. When the measurement was initiated, the analyser triggered when it found a signal match and acquired the data. The resulting measurement data from the transfer function is shown in the following figures.
Despite the ’peaky’ nature of the reference and response signals (Figures 10 to 12), the transfer function magnitude is a relatively smooth curve. Additional smoothing could be applied using a Smooth derived result.
As the use of smart connected audio products proliferates, audio analysis using transfer function measurement is particularly useful for devices that require a speech stimulus or that must be tested open loop. Developers can use it to validate and characterise performance, and accelerate time-to-market, for a variety of smart audio products.
Contact Details and Archive...