Vocal and non-vocal regions from audio songs using SF and PV

In this work, an effort has been made to identify vocal and non-vocal regions from a given song using signal processing techniques and machine learning algorithm. Initially spectral features like mel-frequency cepstral coefficients (MFCCs) are used to develop the baseline system. Statistical values of pitch, jitter and shimmer are considered to improve performance of the system. Artificial neural networks (ANNs) are used to capture the characteristics of vocal and non-vocal segments of the songs.

The experiment is conducted on 60 vocal and 60 non-vocal clips extracted from Telugu albums. 11-point moving window is used to ensure the continuity of vocal and non-vocal segments, thus improving the accuracy of system. With this approach system achieves 85.59% accuracy for vocal and 88.52% for non-vocal segment classification.

Share This Post

Get Updates

Related Posts

Copy-move detection of audio recording with pitch similarity

TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech

Multiple constant multiplication implementations in near-threshold computing systems

Foreground suppression for capturing and reproduction of crowded acoustic environments