Many machine learning frameworks, such as resource-allocating networks, kernel-based methods, Gaussian processes, and radial-basis-function networks, require a sparsification scheme in order to address the online learning paradigm. For this purpose, several online sparsification criteria have been proposed to restrict the model definition on a subset of samples. The most known criterion is the (linear) approximation criterion, which discards any sample that can be well represented by the already contributing samples, an operation with excessive computational complexity. Several computationally efficient sparsification criteria have been introduced in the literature with the distance and the coherence criteria.
This paper provides a unified framework that connects these sparsification criteria in terms of approximating samples, by establishing theoretical bounds on the approximation errors. Furthermore, the error of approximating any pattern is investigated, by proposing upper bounds on the approximation error for each of the aforementioned sparsification criteria. Two classes of fundamental patterns are described in detail, the centroid (i.e., empirical mean) and the principal axes in the kernel principal component analysis. Experimental results show the relevance of the theoretical results established in this paper.