Quantitative histomorphometry (QH) refers to the process of computationally modeling disease appearance on digital pathology images. This procedure typically involves extraction of hundreds of features, which may be used to predict disease presence, aggressiveness, or outcome, from digitizedimages of tissue slides. Due to the “curse of dimensionality”, constructing a robust and interpretable classifier is very challenging when the dimensionality of the feature space is high. Dimensionality reduction (DR) is one approach for reducing the dimensionality of the feature space to facilitate classifier construction. When DR is performed, however, it can be challenging to quantify the contribution of each of the original features to the final classification or prediction result. In QH it is often important not only to create an accurate classifier of disease presence and aggressiveness, but also to identify the features that contribute most substantially to class separability. This feature transparency is often a pre– requisite for adoption of clinical decision support classification tools since physicians are typically resistant to opaque “black box” prediction models.
We have previously presented a method for scoring features based on their importance for classification on an embedding derived via principal components analysis (PCA). However, nonlinear DR (NLDR), which is more useful for many biomedical problems, involves the eigen–decomposition of a kernel matrix rather than the data itself, compounding the issue of classifier interpretability. In this paper we extend our PCA–based feature scoring method to kernel PCA (KPCA). We demonstrate that our KPCA approach for evaluating feature importance in nonlinear embeddings (FINE) applies to several popular NLDR algorithms that can be cast as variants of KPCA, such as Isomap and Laplacian eigenmaps. FINE is applied to four digitalpathology datasets with 53–2- 43 features to identify key QH features describing nuclear or glandular arrangements for predicting the risk of recurrence of breast and prostate cancers. Measures of nuclear and glandular architecture and clusteredness were found to play an important role in predicting the likelihood of recurrence of both breast and prostate cancers. Additionally, FINE was able to identify a stable set of features that provide good classification accuracy on four publicly available datasets from the NIPS 2003 Feature Selection Challenge. Compared to the t–test, Fisher score, and Gini index, FINE was found to yield more stable feature subsets that achieve higher classification accuracy for most of the datasets considered.