An i-vector is a low-dimensional fixed-length representation of a variable-length speech utterance, and is defined as the posterior mean of a latent variable conditioned on the observed feature sequence of an utterance. The assumption is that the prior for the latent variable is non-informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results.
Tests carried out on the NIST 2008 and 2010 Speaker Recognition Evaluation (SRE) dataset show that our proposed method beats three baselines: For the short2-short3 core-task in SRE’08, for the female and male cases, five and six respectively, out of eight common conditions were beaten, and for the core-core task in SRE’10, for both genders, five out of nine common conditions were beaten.