Chat-log Disentanglement via Same-Thread Classification and Direct-Reply Prediction

Author: Chia-Hui Chang, Zhi-Xian Liu, Yu-Ching Liao, Yu-Hao Wu, Thamolwan Poopradubsil

Publish Year: 2022-10

Update by: March 26, 2025

摘要

The early purpose of chatlog (conversation) disentanglement is to separate intermingled messages into detached conversations for easier information following and relevant information retrieving from simultaneous messages. Thus, the problem has been modeled as predicting whether two messages come from the samethread. While the previous study by (Jiang et al., 2018) seems to perform well on samethread prediction, we find that it is because the data are randomly split into training and test sets, resulting verlapping of topics in training and testing sets. When data is split by time order, the performance of existing models drop significantly. In this study, we consider the problem of direct reply predication task and study different message pair classification models for the task. We argue that independent message encoders could better represent messages to capture their interaction than shared messageencoders especially for direct-reply prediction task. We also find that BERT model performs well with small datasets, while other models may outperform BERT with large datasets.