This paper describes a parallel implementation of a promising similarity search algorithm for an audiofingerprinting system. Efficient parallel implementation on a GPU accelerates the search on a dataset containing over 61 million audio fingerprints. The similarity between two fingerprints is defined as the intersection of their elements. We evaluate GPU implementations of two intersection algorithms for this dataset. We show that intelligent use of the GPU memory spaces (shared memory in particular) that maximizes the number of concurrent threads has a significant impact on the overall compute time when using fingerprints of varying dimensions.
With simple modifications we obtain up to 4 times better GPU performance when using GPU memory to maximize concurrent threads. Compared to the CPU only implementations, the proposed GPU implementation reduces run times by up to 150 times for one intersection algorithm and by up to 379 times for the other intersection algorithm.