FiVaTech: Page-Level Web Data Extraction from Template Pages
Author: M. Kayed, C.-H. Chang
Publish Year: 2010-10-28
Update by: March 26, 2025
摘要
In this paper, we proposed a new approach, calledFiVaTech for the problem of Web data extraction.FiVaTech is a page-level data extraction system whichdeduces the data schema and templates for the inputpages generated from a CGI program. FiVaTech usestree templates to model the generation of dynamic Webpages. FiVaTech can deduce the schema and templatesfor each individual Deep Web site, which containseither singleton or multiple data records in one Webpage. FiVaTech applies tree matching, tree alignment,and mining techniques to achieve the challenging task.The experiments show an encouraging result for thetest pages used in many state-of-the-art Web dataextraction works.