Speech vs music discrimination using EM Decomposition

This work explores the use of Empirical Mode Decomposition (EMD) for discriminating speech regions from music in audio recordings. The different frequency scales or Intrinsic Mode Functions (IMFs) obtained from EMD of the audio signal are found to contain discriminatory evidence for distinguishing the speech regions from the music regions of the audio signal.

Different statistical measures like mean, absolute mean, variance, skewness and kurtosis are computed from the various IMFs and investigated for speech vs music discrimination. These features on being used for classification using classifiers like Support Vector Machines (SVMs) and k-Nearest Neighbour (k-NN) on the Scheirer and Slaney database gives the best overall classification accuracy of 90.83% for the SVMs and 85.33% for the k-NN.

Share This Post

Get Updates

Related Posts

Sound source identification for scene analysis

Electrical network signal’s waveform and frequency logging for forensic

GPU implementation of an audio fingerprints similarity search algorithm

Voice Activity Detection (VAD) using Bipolar Pulse Active (BPA) features