This post provides sound samples of a new technique to improve linear predictive coding (LPC). This technique can also be used to modify the perception of vocal effort.
What happens when we use LPC to estimate formant filters from voice samples with two different voice qualities while keeping all other variables constant?
Here we have three pairs of voice samples. In each pair, the same voice is singing the same note but one sample is breathy and the other sample exhibits higher vocal effort. These are the original samples: popeil, low, hi.
LPC was carried out on these samples. New voices were resynthesized using an artificial excitation that remains constant across the two samples in the pair. Since the artificial excitation remains the same, the perceived differences between the samples are due to the LPC formant filters. If you listen to the pairs, you will find that the breathy formant filter sounds like it has more breathiness and the high-effort formant filter still sounds like it has more effort: popeil, low, hi. LPC captures in the formant filter some of the differences between a high-effort voice and a breathy voice. Ideally, this change should not be in the formant filter.
I am working on a variable preemphasis algorithm as an extension of LPC to eliminate variability in the perception of vocal effort from the formant filter. Variable pre-emphasis LPC (VPLPC) results in formant filters that are more uniform across varying voice qualities. VPLPC was carried out on the original samples. New voices were resynthesized using an artificial excitation that remains constant across the two samples in the pair. Since the artificial excitation remains the same, the perceived differences between the samples are due to the VPLPC formant filters. If you listen to the pairs, you will find that the breathy formant filter sounds similar to the high-effort formant filter: popeil, low, hi. The formant filters derived by VPLPC sound more neutral with respect to voice quality than the formant filters derived by standard LPC.
The VPLPC algorithm uses a variable preemphasis (VP) filter to capture variation in the spectral envelope. The variation in the spectral envelope primarily relates to the perception of vocal effort. By manipulating the VP filter, it is possible to increase or decrease the perception of vocal effort. The following samples have been modified solely by changing the VP filter. (It will be easier to hear the differences if you have high-quality speakers or headphones).
Reduce vocal effort:
original popeil_higheffort, popeil_lesseffort
original low_higheffort, low_lesseffort
original hi_higheffort, hi_lesseffort
Increase vocal effort:
original popeil_breathy, popeil_moreeffort
original low_breathy, low_moreeffort
original hi_breathy, hi_moreeffort
Manipulation of the VP filter does not fully transform the perception of vocal effort because our ears expect to hear simultaneous changes to the mix of harmonic and noise content. Our ears expect to hear less aspiration noise in voices with high effort. This makes the VP filter transformation less effective when the original voice has significant aspiration noise.
When reducing the perception of vocal effort, our ears expect to hear more aspiration noise. The following VP filter transformation also adds aspiration noise in an attempt to make the sample sound more natural: original popeil_higheffort, popeil_lesseffort_plusnoise.
In summary, VPLPC produces formant filters that are more resistant to changes in voice quality and the VP filter has some influence on the perception of vocal effort. For a fuller tranformation, more work needs to go into finding an appropriate way to modify the mix of harmonics and noise in the residual.
This is the first attempt to use the VP filter to manipulate the perceived voice quality. More sophisticated techniques could provide more effective control.