從Web擷取興趣點及驗證關係

Author: 莊秀敏

Publish Year: 2016-07

Update by: March 31, 2025

摘要

With the popularity of mobile devices and smartphones, we have witnessed rapid growth in mobile applications and services, especially in location-based services (LBS). According to the mobile marketing survey in 2014, maps/location searches are among the most utilized services on smartphones. Points of interest (POIs), such as stores, gas stations, and parking lots, are common maps/local searches. Existing map services such as Google Maps and Wikimapia are constructed manually either professionally or with crowd-sourcing. However, manual annotation is costly and limited in current POI search services. With the abundance of information on the Web, many POIs can be extracted from the Web. On the other hand, owing to the fact that POI relations are subject to change over time, it is critical to ensure the accuracy of POI data. When some stores close or move, they often result in one-to-many address-to- store-name pairs. Thus, effectively identifying outdated POI relations is important and challenge for improving the quality of databases.We focus on two problems: (1) POI database construction and search on maps, and (2) POI relation verification. For the first study, it contains three tasks: POI extraction, POI pairing, and POI searches. We adopt the query-based crawler to find address-bearing pages which contain addresses and POI names. Moreover, the pairing model is utilized for coupling. To enable POI searches, we integrate multiple search-results for POI ranking. For the second study, the verification model is used to detect outdated POIs in the database via weakly-labeled Web-data. We also analyze the performance with respect to different classifiers and scenarios. We crawled 1.25 million distinct POIs from the Web and implemented a POI search service via Apache Solr platform. The result demonstrated that our performance outperformed Wikimapia and a commercial app called "What's the Number?" and was close to Google Maps. For POI pairing, the performance can achieve 91.1% F1-measure. In addition, detecting outdated POIs can improve to 72.8% accuracy via tri-training.