Information Extraction in Legal Texts: Investigating LLMs' Performance on Traffic Accident Verdicts

Author: Huai-Hsuan Huang, Chia-Hui Chang, Kuo-Chun Chien, Jo-Chi Kung

Publish Year: 2025-06

Update by: June 25, 2025

摘要

Traffic accident compensation cases constitute one of the largest categories of civil litigation in Taiwan. Although judicial decisions are publicly available, they predominantly exist as unstructured texts, lacking annotated datasets necessary for practical legal NLP applications. Since judicial documents often contain complex language, diverse structures, and require cross-paragraph reasoning, they pose significant challenges in information extraction. To evaluate how much large language models (LLMs) understand judicial documents, this study presents a benchmark dataset comprising 1,000 manually annotated Taiwanese traffic accident rulings, covering 18 fields related to critical compensation. We evaluated multiple large language models (LLMs), including \texttt{GPT-4o} and \texttt{Meta LLaMA-3-8B}, across various prompt engineering strategies (Basic, Advanced, and One-Shot) to assess their effectiveness in information extraction tasks. Our evaluation focuses on both structured fields and semantically complex, context-dependent extraction tasks, taking into account prompt design, reasoning capabilities, and input-length constraints.Experimental results reveal that traditional regular expressions perform reliably on both string and numerical fields (0.80 accuracy) but fail in contextually ambiguous scenarios. In contrast, LLMs, particularly when guided by well-designed prompts, significantly outperform traditional methods in complex extraction tasks (0.81 and 0.84 accuracy). Prompt engineering and parameter-efficient fine-tuning methods such as QLoRA are more flexible and scalable than full-model fine-tuning, especially in low-resource environments.