James M. Keller
IEEE Members: Free
Non-members: FreeDuration: 00:53:15
Abstract: Many of us participate in clustering research as a means of exploration aimed at understanding the structure and organization of vague and imprecise data. Most papers focus on the creation of new approaches to perform clustering. But, just how good are the results of clustering algorithms? There are several well known measures of cluster validity that are routinely utilized. Most focus on balancing the criteria of compactness and separation. We present here a method for comparing crisp and soft partitions (i.e., probabilistic, fuzzy and possibilistic) to a known crisp reference partition. Many of the classical indices that have been used with outputs of crisp clustering algorithms are generalized so that they are applicable for candidate partitions of any type. In particular, focus is placed on generalizations of the Rand index. Additionally, we extend these partition comparison methods by (1) investigating the behavior of the soft Rand for comparing non-crisp, specifically possibilistic, partitions and (2) we demonstrate how the possibilistic Rand and visual assessment of (cluster) tendency (VAT) algorithm can be used to discover the number of actual clusters and coincident clusters for outputs from the possibilistic c-means (PCM) algorithm.