In this paper, the performance of multi-taper spectral estimate is investigated relative to conventional single taper estimate for the application of emotion recognition from speech signals. Typically, a single taper/window helps in reducing bias of the estimate, but due to its high variance, the resulting spectral features tend to give poor recognition performance. The weighted averages of the multi-tapered uncorrelated eigen-spectra results in more discriminative spectral features, thus increasing the overall performance.
We demonstrate that the application of six Multi-peak multi-tapers with support vector machine results in 81% classification accuracy on seven emotions from Berlin emotion database considering only spectral features, compared to 72% using conventional Hamming window method.