Abstract:Data validation is one of the most important phases in KDD (Knowledge Discovery and Data Mining). Since Internet and computer are unavailable in some observation station and data preprocessing is necessary, most soil observation data in our country could not be included in database online. Most of the data are stored and preprocessed by software like Microsoft Excel before they are reported to Soil Sub-Center. These steps often lead to some uncxpected errors. We present a customizable rule based model in this paper. The model consists of several modules: Data format transformation module, Privilege management module, Metadata management module, Record De-duplication module, Data Cleansing module and Rule customization & parser module. Low-invasive and light-weight design make the model validatc data successfully while without affecting the old data entry system. At the same time, Customizable Rule makes the model much easier to extend.