A probabilistic framework for dynamic k estimation in kNN classifiers with certainty factor

Accuracy of the well-known k-nearest neighbor (kNN) classifier heavily depends on the choice of k. The problem of estimating a suitable k for any test point becomes difficult due to several factors like the local distribution of training points around that test point, presence of outliers in the dataset, and, dimensionality of the feature space. In this paper, we propose a dynamic k estimation algorithm based on the neighbor density function of a test point and class variance as well as certainty factor information of the training points.

Performance of the kNN algorithm with the proposed choice of k is evaluated on UCI benchmark datasets without employing any dimensionality reduction and/or explicit learning method(s). Experimental results clearly demonstrate the supremacy of the proposed method over the existing kNN algorithm with k = 1, 3, 5 and [√N] (N denotes the number of training points) for different distance measures).

Share This Post