Comparing 1rst, 2nd and 3rd order pre-emphasis filters
You will find some samples, below, that demonstrate how the voice conversion algorithm sounds different depending upon the order of the adaptive pre-emphasis filter. The goal of the algorithm is to reduce the perceived vocal effort and to increase the perceived breathiness.
In the adaptive pre-emphasis algorithm, it’s necessary to choose an order of filter for the pre-emphasis. If the order is too low, the pre-emphasis does not have enough dynamic range. If the order is too high, the pre-emphasis will capture formant information.
I’m using LPC to estimate the pre-emphasis filter. As long as it is a first order filter, the pole of the filter is at zero hertz and the pre-emphasis filter looks like a spectral tilt. This is the typical configuration for the pre-emphasis filter. At orders higher than one, LPC can estimate a pre-emphasis filter with pole(s) at higher frequencies in the voice spectrum. This happens with high-effort voices and the resulting pre-emphasis looks like a spectral tilt plus a mid-range resonance. You can find a plot of this result in the DAFX paper.
I can make a number of arguments about whether the pre-emphasis should have a resonance in it or not. I’m not going to explain it now except to say that perceived vocal effort is the result of both changes to the voice source and changes to the vocal tract filter.
The above explanation was very brief. Whether you understand it or not, you can listen to some of the resulting samples, below:
We want to make the high-effort voice sound like this target breathy voice: wav.
Here is the original high-effort voice: wav.
One common way to try to simulate breathiness is to add aspiration noise to the LPC residual. This is what it sounds like when we do that with the high-effort voice: wav.
The voice conversion algorithm uses adaptive pre-emphasis LPC to reduce the perceived vocal effort in the voice before adding noise to simulate breathiness. Here is the transformed high-effort voice:
I have opinions about the sounds of these samples but I’m curious about your opinion. Which sample do you think gets closest to the target breathy voice? Which sample sounds the most natural to you? Which sample sounds the most unnatural?