MapMarker: Extraction of Postal Addresses And Associated Information for General Web Pages
Author: C.-H. Chang, S.-Y. Lee
Publish Year: 2010-09-01
Update by: March 26, 2025
摘要
Address information is essential for people’s dailylife. People often need to query addresses of unfamiliarlocation through Web and then use map services tomark down the location for direction purpose.Although both address information and map servicesare available online, they are not well combined. Usersusually need to copy individual address from a Website and paste it to another Web site with map servicesto locate its direction. Such copy and paste operationshave to be repeated if multiple addresses are listed ona single page such as public school list or apartmentlist. Furthermore, associated information withindividual address has to be copied and included oneach marker for better comprehension.Our research is devoted to automate the aboveprocess and make the combination an easier task forusers. The main techniques applied here include postaladdress extraction and associated informationextraction. We apply sequence labeling algorithmbased on Conditional Random Fields (CRFs) to trainmodels for address extraction. Meanwhile, using theextracted addresses as landmarks, we apply patternmining to identify the boundaries of address blocksand extract associated information with eachindividual address. The experimental result shows highF-score at 91% for postal address extraction and 87%accuracy for associated information extraction.