Categorical Data Visualization and Clustering with Subjective Factor
Author: Zhi-Kai Ding (丁智凱)
Publish Year: 2003-07
Update by: March 30, 2025
摘要
Clustering is a useful method to explore the structures of complex data sets. However how to determine the appropriate cluster number is still a problem. It involves with human factor because different people may have different point. Therefore, integrating visualization and subjective factor to help user explore the data set is a practical way. Unfortunately, most visualization methods concerns only numeric data. Categorical data visualization is still an unresolved issue. In this paper, a new clustering approach called CDCS is introduced. Its central idea is a subjective factor extracting strategy.In the first step, the CDCS employs a single-pass clustering approach with a classification based similarity function to cluster data strictly. These clusters discovered from the first step are called s-clusters because they are usually small. Then, users can use our interactive visualization tool to observe the grouped s-clusters under certain merge threshold. At last, users can choose an appropriate merge threshold to merge them. This new approach can increase the clustering result reliability by extracting subjective factor and our experiment also shows that CDCS generates better quality clusters than other typical algorithms.