###
DOI:
计算机系统应用英文版:2010,19(8):78-81
本文二维码信息
码上扫一扫!
基于规则的土壤数据校验模型研究与实现①
(1.中国科学院计算机网络信息中心 北京 100190;2.中国科学院南京土壤所 江苏 南京 210008)
Research and Implementation of Rule-Based Data Cleaning Model for Soil Data
摘要
图/表
参考文献
相似文献
本文已被:浏览 1600次   下载 2463
Received:November 24, 2009    Revised:December 29, 2009
中文摘要: 数据校验是数据挖掘与知识发现中的重要一环。我国土壤观测数据由于台站观测人员上网条件、观测地记录不便以及需要适当的数据预处理等原因,无法实行在线入库,一般借助于Excel等软件来记录中间结果,再提交土壤分中心,这样的记录过程经常引入不必要的错误。提出了一个基于可定制规则库的土壤数据校验模型。模型主要包括数据格式转换模块、权限管理模块、元数据管理模块、重复记录去除模块、数据校验模块及规则定制与解析模块。低侵入式的轻量级设计,使得在大大减轻数据校验人员工作量的情况下,原有的数据填报流程不需要改变。可定制规则使得
Abstract:Data validation is one of the most important phases in KDD (Knowledge Discovery and Data Mining). Since Internet and computer are unavailable in some observation station and data preprocessing is necessary, most soil observation data in our country could not be included in database online. Most of the data are stored and preprocessed by software like Microsoft Excel before they are reported to Soil Sub-Center. These steps often lead to some uncxpected errors. We present a customizable rule based model in this paper. The model consists of several modules: Data format transformation module, Privilege management module, Metadata management module, Record De-duplication module, Data Cleansing module and Rule customization & parser module. Low-invasive and light-weight design make the model validatc data successfully while without affecting the old data entry system. At the same time, Customizable Rule makes the model much easier to extend.
文章编号:     中图分类号:    文献标志码:
基金项目:中国科学院“十一五”专项项目;中国科学院知识创新工程重要方向项目 (KZCX2-YW-433-03)
引用文本:
张仁,沈志宏,黎建辉,施建平.基于规则的土壤数据校验模型研究与实现①.计算机系统应用,2010,19(8):78-81
ZHANG Ren,SHEN Zhi-Hong,LI Jian-Hui,SHI Jian-Ping.Research and Implementation of Rule-Based Data Cleaning Model for Soil Data.COMPUTER SYSTEMS APPLICATIONS,2010,19(8):78-81