Analysis of correlation structure of data set for efficient pattern classification

Pattern classification or clustering plays important role in a wide variety of applications in different areas like psychology and other social sciences, biology and medical sciences, pattern recognition and data mining. A lot of algorithms for supervised or unsupervised classification have been developed so far in order to achieve high classification accuracy with lower computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some influence on the algorithm for classification.

In this work, the data set characteristics is studied in terms of intra attribute relationship and a measure MVS (multivariate score) has been proposed to quantify and group different data sets on the basis of the correlation structure into strong independent, weak independent, weak correlated and strong correlated data set. The performance of different feature selection algorithms on different groups of data are studied by simulation experiments with 63 publicly available bench mark data sets. It has been verified that univariate methods lead to significant performance gain for strong independent data set compared to multivariate methods while multivariate methods have better performance for strong correlated data sets.

Share This Post