c. DAFX06

Further improvements to the voice conversion algorithm

Friday, October 27th, 2006

I wasn’t entirely happy with the DAFX samples. There were a number of artifacts in those voice samples that I have now reduced. (You will need good speakers to hear the differences properly, i.e. they certainly won’t sound right through laptop speakers.)

My goal is to convert a high-effort voice into a breathy voice.

Here is the breathy voice that I’m using as my target: wav.

Here is the high-effort voice that I’m trying to transform: wav.

And here is the transformed voice with the pre-emphasis modified to simulate reduced effort. Pulsed noise has also been added to simulate breathiness: wav
Same thing with more aspiration noise added: wav.

The transformed high-effort voice sounds more relaxed and breathy even if it has not been fully transformed into the target voice.

The main problem with the DAFX voice samples is that there is too much gain and spectrum modulation in the LPC filter (the LPC filter bounces around). When there is just one LPC filter, this problem is not as large. However, my algorithm has two LPC filters in series. The filter modulation from the pre-emphasis filter (low-order LPC) exacerbates the filter modulation from the following vocal tract filter (high-order LPC), making the artifacts worse.

I reduced the artifacts by keeping the pre-emphasis filter constant for the short voice segments that I am synthesizing. The pre-emphasis still varies from sample to sample. The next step would be to use time-varying pre-emphasis but to smooth the filter coefficients in time.

I also reshaped the added noise to make it more similar to the breathy noise in the target voice.

PS: You can hear even more recent sound samples here.

DAFX06 samples

Thursday, March 30th, 2006

This post provides sound samples of a new technique to improve linear predictive coding (LPC). This technique can also be used to modify the perception of vocal effort.

What happens when we use LPC to estimate formant filters from voice samples with two different voice qualities while keeping all other variables constant?

Here we have three pairs of voice samples. In each pair, the same voice is singing the same note but one sample is breathy and the other sample exhibits higher vocal effort. These are the original samples: popeil, low, hi.

LPC was carried out on these samples. New voices were resynthesized using an artificial excitation that remains constant across the two samples in the pair. Since the artificial excitation remains the same, the perceived differences between the samples are due to the LPC formant filters. If you listen to the pairs, you will find that the breathy formant filter sounds like it has more breathiness and the high-effort formant filter still sounds like it has more effort: popeil, low, hi. LPC captures in the formant filter some of the differences between a high-effort voice and a breathy voice. Ideally, this change should not be in the formant filter.

I am working on a variable preemphasis algorithm as an extension of LPC to eliminate variability in the perception of vocal effort from the formant filter. Variable pre-emphasis LPC (VPLPC) results in formant filters that are more uniform across varying voice qualities. VPLPC was carried out on the original samples. New voices were resynthesized using an artificial excitation that remains constant across the two samples in the pair. Since the artificial excitation remains the same, the perceived differences between the samples are due to the VPLPC formant filters. If you listen to the pairs, you will find that the breathy formant filter sounds similar to the high-effort formant filter: popeil, low, hi. The formant filters derived by VPLPC sound more neutral with respect to voice quality than the formant filters derived by standard LPC.

The VPLPC algorithm uses a variable preemphasis (VP) filter to capture variation in the spectral envelope. The variation in the spectral envelope primarily relates to the perception of vocal effort. By manipulating the VP filter, it is possible to increase or decrease the perception of vocal effort. The following samples have been modified solely by changing the VP filter. (It will be easier to hear the differences if you have high-quality speakers or headphones).

Reduce vocal effort:
original popeil_higheffort, popeil_lesseffort
original low_higheffort, low_lesseffort
original hi_higheffort, hi_lesseffort

Increase vocal effort:
original popeil_breathy, popeil_moreeffort
original low_breathy, low_moreeffort
original hi_breathy, hi_moreeffort

Manipulation of the VP filter does not fully transform the perception of vocal effort because our ears expect to hear simultaneous changes to the mix of harmonic and noise content. Our ears expect to hear less aspiration noise in voices with high effort. This makes the VP filter transformation less effective when the original voice has significant aspiration noise.

When reducing the perception of vocal effort, our ears expect to hear more aspiration noise. The following VP filter transformation also adds aspiration noise in an attempt to make the sample sound more natural: original popeil_higheffort, popeil_lesseffort_plusnoise.

In summary, VPLPC produces formant filters that are more resistant to changes in voice quality and the VP filter has some influence on the perception of vocal effort. For a fuller tranformation, more work needs to go into finding an appropriate way to modify the mix of harmonics and noise in the residual.

This is the first attempt to use the VP filter to manipulate the perceived voice quality. More sophisticated techniques could provide more effective control.