Improvement of Kernel Dependency Estimation and Case Study on Skewed Data

Author: 陳慶治

Publish Year: 2013-07

Update by: March 31, 2025

摘要

Kernel dependency estimation is a learning framework of finding the dependencies between two general classes of objects. Although already succeeded in many kinds of applications, its properties are not fully studied. In this paper we will discuss two practical issues in it. The first one is about its real-value output for each label which is different from the ultimate target-binary value for one of k coding scheme. Thus there usually exists a gap between predicted real-value from KDE and the ground true binary value. One common practice to reduce the gap is using threshold strategies. In this paper we provide an alternative approach to combine a second level classifier by a special degenerated form of stacked generalization. The second issue is about how the performance decreases when KDE is applied to classification with skewed data, our experiments show KDE is not an appropriate approach for skewed data, and then we provide a remedy to handle the skewed data.