Fast audio retrieval is crucial for many important applications and yet demanding due to the high dimension nature and increasingly larger volume of audios in the internet. Although audio fingerprinting can greatly reduce its dimension while keeping audio identifiable, the dimension of audio fingerprints is still too high to scale up for big audio data. The tradeoff between the accuracy and the efficiency prevents the further reducing of the dimension of fingerprints. This paper proposes a multi-stage filtering strategy for audio retrieval, with the beginning stages focusing on speed up by using a middle fingerprint with much smaller size to quickly filtering the most likely audios, and the ending stages emphasizing on accuracy by applying an accurate and robust fingerprint on the small set of the most likely audios.
A notion called middle fingerprint is devised with considerable small dimension for quickly filtering out most irrelevant audios. A matching algorithm is developed to reduce the computational complexity by comparing the samples at fixed interval of two audios with thresholds. By using the middle fingerprint,audio retrieval can get a speed gain of 12 times on average compared with the Fibonacci Hashing retrieval. By combing the Fibonacci hashing algorithm with the middle filtering retrieval and the matching algorithm, we propose an efficient cascaded filtering retrieval methods, which can further improve the retrieval by 250 times on average. After applying MP3 conversion, resampling, and random shearing, the recall rates of the method are all above 99.47%, and the theoretical accuracy is close to 100%.