Annotating Phonetic Component of Chinese Characters Using Constrained Optimization and Pronunciation Distribution

Author: C.-H. Chang, S.-Y. Lin, S.-Y. Li, M.-F. Tsai, S.-P. Li, H.-M. Liao, C.-W. Sun, N. E. Huang

Publish Year: 2010-06

Update by: March 26, 2025

摘要

Generally speaking, Chinese characters are graphic characters that do not allowimmediate pronunciation unless they are accompanied with Mandarin phoneticsymbols (zhuyin) or other pinyin methods (e.g. romanization system). In fact, about80 to 90 percents of Chinese characters are pictophonetic characters which arecomposed of a phonetic component and a semantic component. Therefore, even ifone had not seen the character before, one can make a logical guess at thecharacter's pronunciation and meaning from its phonetic and semantic symbols. Inorder to analyze such relations, we start by analyzing the characteristics of phoneticcomponents. We found two interesting features that could automatically identifythe phonectic components of Chinese characters. One is pronunciation similarity,the other is pronunciation distribution. Experiments show that these two methodshave high accuracy (90.8% and 98.1% for 9593 pictophonetic characters) inpredicting the phonetic components of pictophonetic characters. These methodscan save a lot of time and effort during the annotation of phonetic symbols in theearly stage.