本文已被:浏览 1600次 下载 2463次
Received:November 24, 2009 Revised:December 29, 2009
Received:November 24, 2009 Revised:December 29, 2009
中文摘要: 数据校验是数据挖掘与知识发现中的重要一环。我国土壤观测数据由于台站观测人员上网条件、观测地记录不便以及需要适当的数据预处理等原因,无法实行在线入库,一般借助于Excel等软件来记录中间结果,再提交土壤分中心,这样的记录过程经常引入不必要的错误。提出了一个基于可定制规则库的土壤数据校验模型。模型主要包括数据格式转换模块、权限管理模块、元数据管理模块、重复记录去除模块、数据校验模块及规则定制与解析模块。低侵入式的轻量级设计,使得在大大减轻数据校验人员工作量的情况下,原有的数据填报流程不需要改变。可定制规则使得
Abstract:Data validation is one of the most important phases in KDD (Knowledge Discovery and Data Mining). Since Internet and computer are unavailable in some observation station and data preprocessing is necessary, most soil observation data in our country could not be included in database online. Most of the data are stored and preprocessed by software like Microsoft Excel before they are reported to Soil Sub-Center. These steps often lead to some uncxpected errors. We present a customizable rule based model in this paper. The model consists of several modules: Data format transformation module, Privilege management module, Metadata management module, Record De-duplication module, Data Cleansing module and Rule customization & parser module. Low-invasive and light-weight design make the model validatc data successfully while without affecting the old data entry system. At the same time, Customizable Rule makes the model much easier to extend.
文章编号: 中图分类号: 文献标志码:
基金项目:中国科学院“十一五”专项项目;中国科学院知识创新工程重要方向项目 (KZCX2-YW-433-03)
Author Name | Affiliation |
ZHANG Ren | 中国科学院计算机网络信息中心 北京 100190 |
SHEN Zhi-Hong | 中国科学院计算机网络信息中心 北京 100190 |
LI Jian-Hui | 中国科学院计算机网络信息中心 北京 100190 |
SHI Jian-Ping | 中国科学院南京土壤所 江苏 南京 210008 |
Author Name | Affiliation |
ZHANG Ren | 中国科学院计算机网络信息中心 北京 100190 |
SHEN Zhi-Hong | 中国科学院计算机网络信息中心 北京 100190 |
LI Jian-Hui | 中国科学院计算机网络信息中心 北京 100190 |
SHI Jian-Ping | 中国科学院南京土壤所 江苏 南京 210008 |
引用文本:
张仁,沈志宏,黎建辉,施建平.基于规则的土壤数据校验模型研究与实现①.计算机系统应用,2010,19(8):78-81
ZHANG Ren,SHEN Zhi-Hong,LI Jian-Hui,SHI Jian-Ping.Research and Implementation of Rule-Based Data Cleaning Model for Soil Data.COMPUTER SYSTEMS APPLICATIONS,2010,19(8):78-81
张仁,沈志宏,黎建辉,施建平.基于规则的土壤数据校验模型研究与实现①.计算机系统应用,2010,19(8):78-81
ZHANG Ren,SHEN Zhi-Hong,LI Jian-Hui,SHI Jian-Ping.Research and Implementation of Rule-Based Data Cleaning Model for Soil Data.COMPUTER SYSTEMS APPLICATIONS,2010,19(8):78-81