Home Papers Reports Projects Code Fragments Dissertations Presentations Posters Proposals Lectures given Course notes

Enhancing Stereo

Werner Van Belle1 - werner@yellowcouch.org, werner.van.belle@gmail.com

1- Yellowcouch;

Abstract :  This short note covers how to use a hilbert transform to decorrelate a mono-signal, how to use a Linkwitz-Riley crossover to improve stereoperception and how to normalize the head related system transfer function developed at MIT in the 90ies.

Keywords:  stereo widening
Reference:  Werner Van Belle; Enhancing Stereo; Audio Processing; YellowCouch; November 2011


Table Of Contents
Easy strategies
    Strategy 1: difference between left and right
    Strategy 2: Cross Channel Delay
    Strategy 3: Delay only 1 channel
    Strategy 4: Decorrelate using a variable delay line
Complexer strategies
    Strategy 5: Inter-delay on high frequencies
    Strategy 6: Decorrelation using a 90 degree phase shift
    Strategy 7: Head related transfer functions
Conclusion
Bibliography

Sometimes people want to widen the stereo experience of audio tracks that were originally mono. Below we address a number of techniques that can be used to do this. Before delving into the details it is worth noting some considerations that affect 'stereo enhancers' as we will call them

Easy strategies

Strategy 1: difference between left and right

The first strategy is the calculation of the stereo signal against the mono signal. We first calculate first the center signal center=(left+right)/2 and then can calculate the left or right channel difference towards the center: new_left=left-center. The center signal can then again be added to new_left or new_right, depending on how much 'stereo' we exactly want.

In practice there are three problems with this strategy.

Below some demonstrations of this strategy with a middle gain of 0.25.

In summary, a naive idea that will not work as expected.

Strategy 2: Cross Channel Delay

Another strategy one often reads is to use a delay on the extracted left and right channel and feed it back to the other channel. This would lead to the perception that the material that was originally left will bounce against the wall and arrive somewhat later in the right ear. In this case a delay of 1 millisecond would reflect a distance difference of about 30cm. A delay of 25 milliseconds will be perceived as bouncing against a wall that is 7.5 meters away.

Frequency response when mixing a delayed and original signal

The main problem with this strategy is that it creates a comb filter [1]. This happens because the shared part of the left and right channel will be added (but with a delay) back to the original signal. That results in a audible filter. They are often perceived as flanging effects. The above picture shows the frequency response if we use a delay line of 1 millisecond or 20 milliseconds.

The other problem with this strategy is that a signal is bounced to another channel. Imagine a signal where the left channel has a trumpet, the right doesn't. After 'widening' the stereo, the left and right channel will suddenly sound similar, and both have the trumpet playing.

Strategy 3: Delay only 1 channel

Instead of bouncing the signal to the other channel, we could also consider delaying one channel only.

Delaying one channel more than the other.

Observations:

Strategy 4: Decorrelate using a variable delay line

Another technique that might be attempted is to reduce the correlation between the left and right channel. Here we tune the delay of one channel such that the 'correlation' between the left and right channel is minimal.

To implement this strategy one can work with 5 parallel delay lines (with delays ranging from 0 to 50 milliseconds). Based on the output of each of those delay lines we choose which delay is preferred to achieve the best stereo image. How 'stereo' the signal is, can be easily determined by calculating the absolute distance between the left delayed signal and the right delayed signal. The delay line that produces the highest number is the one with the best stereo image.

The output of the delay line selector can then alter a secondary delay that routes the original sound signal through. The change to this variable delay line cannot be too large. The problem here is that, When we lengthen the delay line, the sound will come out somewhat slower. And this until we achieved the the required delay length. Every time we shorten the delay the sound will play slightly faster. The maximum pitch change we might notice is thus given by maximum/minimum speed. This can then be converted to a pitch change. At a delay-change-speed of 1%, we will have a maximum pitch change of 2% (1.01/0.99=1.0202), or 35 cents. Essentially, we should only change the delay line at a maximum speed of 0.01 sample per output sample.

A second aspect to the variable delay line strategy is that we will look at the energy content of the signal, as opposed to the individual samples. In front of all the delay lines we place an envelope measurement in windowsizes of around 50 milliseconds. The latency of the envelope window should also be added to the variable delay and static delay line of the left and right channel.

A last remark is that we should limit the maximum position difference between the left and right channel to 50 milliseconds. That leads to a natural repositioning of high frequencies, while the lower frequencies (which typically have a less accurate time position) will not be affected that hard.

Complexer strategies

Strategy 5: Inter-delay on high frequencies

The biggest problem of strategy 3 was that low frequencies were affected negatively. To improve upon that, we will need the ability to split the spectrum into lower frequencies and higher frequencies. That makes it possible to delay the hi-left and hi-right frequencies differently while keeping the lo-left and lo-right frequencies aligned in the middle of the two hi-delays.

A stereo enhancer using a Linkwitz-Riley splitter to divide the incoming spectra. The high frequencies are delayed differently. The lowfrequencies are not delayed.

A good method to split the spectrum in two is using Linkwitz-Riley crossover [2, 3, 4, 5]. Such is made by place two Butterworths of even order in series. The resulting high pass and lowpass sections are complementary. The sum of all the energy across the spectrum will be 0 dB, it does however allow us to delay the high frequencies differently from the low frequencies. Nevertheless, because a Linkwitz-Riley crossover is made of a series of infinite impulse response filters, it is necessary to briefly have a look at the group delay as to understand when and how flanging effects might occur.

A 14th order splitting of the frequency band. The top panel show the dB gain for the two extraced channels. The bottom panel shows the groupdelay in milliseconds. As can be seen even at this extraordinaly strong splitting, the group delay remains below 2.5 milliseconds.

When do flanging effects occur ?

Because we potentially (when listening to the stereo enhanced signal in mono), have flanging effects starting at the transition band, we are interested to understand what type of delays causes what effect above the transition band.

Flanging effects above a reasonable transitionband

As can be seen in the above image, the flanging effects are 'strongest' when the delay is small. With strongest we mean that the lobes are relatively wide, which is bad because a specific sound might fall completely within a valley. This cannot happen with the larger delay of 10 milliseconds. The lobes are very small, and it will of course be audible, but the effect will be more or less the same across the entire high section.

Below some audio demos using a LR crossover consisting of two 4th order Butterworths. A crossover frequency at octave 6 and a high inter-delay of 20 and 2 milliseconds. Below we find that this strategy works best on drums (E.g Africa, beware the difference in loudness which was not yet compensated for).

Summarized:

Strategy 6: Decorrelation using a 90 degree phase shift

A Hilbert transform setup to bring a mono channel 90 degrees out of phase.

Another strategy to ensure that a mono signal is 'maximally' decorrelated, is to phase shift all frequencies of one channel with 90 degrees. To implement this we can fall back to a Hilbert transformer [6], which does exactly this.

The result of a hilbert transform on a mono signal.

The above picture shows the result of a Hilbert transform on a mono signal. The resulting left and right signal are 90 degrees out of phase. An interesting observation can be made here, namely that the lower frequencies are delayed longer than the high frequencies. This is something unachievable with ordinary delay lines.

Some audio demos, when using a Hilbert transformer of 2048 samples.

Strategy 7: Head related transfer functions

Another possibility that one can consider is the use of the KEMAR head model. In 1994 an extensive set of head related system transfer functions was generated at MIT [7]. This was done by using a head dummy (called a KEMAR) which got two microphones inserted in his ears. Then an impulse was generated and at various azimuths and elevation angles the impulse response of the head was measured.

The KEMAR data was used by the original authors to create a 3D perception field, but did not live up to my expectations. The main reason being that the transfer functions were literally used as measured,m which is unwanted. After all, we don't want to modify the sound according to an extra ear channel. Instead, it is more interesting to understand how the sound modifies when listened to at different angles. That means that we want to normalize the dataset toward a standard position. Typically this is chosen such that the sound source is in front of the listener.

Kemar setup. The angles in the dataset are given as the rotation of the turntable. 0 degree is a left ear that points directly to the speaker. The 90 degree angle is a rotation of 90 degress clockwise of the head, which is thus a -90 degree angle of the soundsource. At turntableangle -90 degree our head is staring directly into the speaker

We first normalized each spectrum such that the average energy received in the ear was 0 for all impulse responses. Once that was done we ensured that the filter modifications would be smooth when turning our hand. The function that describes the gain of a specific frequency under a certain angle of our head is denoted freqresp(angle). To have a smooth behavior we bandlimitted freqresp(angle) using a sinc interpolator.

The perceived gain in our left ear of a number of frequencybands as we turn our head.

Below We demonstrate the effect such head transfer function has on our perception. In both cases a sound source rotates counterclockwise around our head, starting at the left ear. In the first the KEMAR data is used to filter the signal appropriately. In the second, only panning takes place.

Conclusion

Currently the cheapest and most efficient method to have a good stereo-enhancer, is using strategy 5: a Linkwitz-Riley crossover on both channels, which we then use to delay each of the high frequencies channels differently. The low frequencies are both delayed as the average delay of the two high frequency delays. A crossover should have appropriate order (4, 8). The delay between the high frequencies should be larger than 5 milliseconds, but lower than 30 milliseconds. This works especially well on drums.

Decorrelating the signal (by bringing them 90 degrees out of phase), does not perform as we would want. This is probably because we need to decorrelate the perceived information (energies) and not only the waves themselves. Decorrelation has the advantage that it will bring signals so out of sync that even a 180 degree phase shift of one of the two signals will not lead to a cancellation. This might sometimes be wanted. I also suspect that decorrelation will work well on signals that have a lot of echo on them.

The kemar related experiment is a small success. A pitty that it requires over 4000 operations per sample. Otherwise it would certainly be useful.

If computational power is not a limitation then a combination of the 3 mentioned techniques might constitute to a good stereo enhancer.

Bibliography

1.Feedforward Comb Filters Julius Orion Smith III Physical Audio Signal Processing; W3K Publishing, 2010, ISBN 978-0-9745607-2-4 https://ccrma.stanford.edu/~jos/waveguide/Feedforward_Comb_Filters.html
2.Crossovers Siegfried Linkwitz Linkwitz Lab http://www.linkwitzlab.com/crossovers.htm
3.Linkwitz-Riley Crossovers: A Primer Dennis Bohn RaneNote 160, written October 2005 http://www.rane.com/note160.html
4.Active Crossover Networks for Non-coincident Drivers S. H. Linkwitz J. Audio Eng. Soc., vol. 24, pp. 2-8 (Jan/Feb 1976)
5.12 dB/oct Linkwitz-Riley crossover S. H. Linkwitz Linkwitz Lab http://www.linkwitzlab.com/filters.htm#2
6.Analytic Signals and Hilbert Transform Filters Julius Orion Smith III 2. Edition Mathematics of the Discrete Fourier Transform (DFT), with Audio Applications, W3K Publishing, 2007, ISBN 978-0-9745607-4-8 https://ccrma.stanford.edu/~jos/r320/Analytic_Signals_Hilbert_Transform.html
7.HRTF Measurements of a KEMAR Dummy-Head Microphone Bill Gardner, Keith Martin MIT Media Lab Perceptual Computing - Technical Report #280; May 1994 http://sound.media.mit.edu/resources/KEMAR.html

http://werner.yellowcouch.org/
werner@yellowcouch.org