###

DOI:

计算机系统应用英文版:2010,19(9):1-4

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

Web信息抽取及知识表示系统的研究与实现

谭守标¹, 徐超¹, 江元¹, 宁仁霞²

(1.安徽大学电子科学与技术学院安徽合肥 230039;2.黄山学院电子信息工程系安徽黄山 245021)

Research and Realization of a Web Information Extraction and Knowledge Presentation System

摘要

图/表

参考文献

相似文献

本文已被：浏览 1817次下载 3349次
Received:January 06, 2010 Revised:February 26, 2010

中文摘要: 研究了从数据密集型Web页面中自动提取结构化数据并形成知识表示系统的问题。基于知识数据库实现动态页面获取，进行预处理后转换为XML文档，采用基于PAT-array的模式发现算法自动发现重复模式，结合基于本体的关键词库自动识别页面数据显示结构模型，利用XML的对象-关系映射技术将数据存入知识数据库，由此实现Web数据自动抽取。同时，利用知识数据库已有知识从互联网抽取新知识，达到知识数据库的自扩展。以交通信息自动抽取及混合交通出行方案生成与表示系统进行的实验表明该系统具有高抽取准确率和良好的适应性。

中文关键词: Web信息提取知识表示数据密集型Web页面基于本体的关键词库

Abstract:The Web Information Extraction and Knowledge Presentation System is proposed to extract information from data intensive web pages. It downloads dynamic web pages, based on a knowledge database, changes them to XML documents after preprocessing, finds repeated patterns from them, by using a PAT-array based Pattern Discovery Algorithm, recognizes their data display structure models, automatically based on the repeated patterns and an ontology-based keyword library, and then extracts the data and stores them in the knowledge database with the object-relational mapping technology of XML. Through these steps, web data is extracted automatically, and the knowledge database is also expanded automatically. Experiments on the traffic information auto-extraction and mixed traffic travel schemes auto-creation system showed that the system has high precision and is adaptive to web pages in different domains with different structures.

keywords: web information extraction knowledge presentation data intensive web pages ontology-based keyword library

文章编号： 中图分类号： 文献标志码：

基金项目:安徽省教育厅自然科学基金(2005KJ004ZD)

Author Name	Affiliation
TAN Shou-Biao	安徽大学电子科学与技术学院安徽合肥 230039
XU Chao	安徽大学电子科学与技术学院安徽合肥 230039
JIANG Yuan	安徽大学电子科学与技术学院安徽合肥 230039
NING Ren-Xia	黄山学院电子信息工程系安徽黄山 245021

Author Name	Affiliation
TAN Shou-Biao	安徽大学电子科学与技术学院安徽合肥 230039
XU Chao	安徽大学电子科学与技术学院安徽合肥 230039
JIANG Yuan	安徽大学电子科学与技术学院安徽合肥 230039
NING Ren-Xia	黄山学院电子信息工程系安徽黄山 245021

引用文本：
谭守标,徐超,江元,宁仁霞.Web信息抽取及知识表示系统的研究与实现.计算机系统应用,2010,19(9):1-4
TAN Shou-Biao,XU Chao,JIANG Yuan,NING Ren-Xia.Research and Realization of a Web Information Extraction and Knowledge Presentation System.COMPUTER SYSTEMS APPLICATIONS,2010,19(9):1-4