Abstract:Data repairing based on editing rules and master data can automatically and exactly fix inconsistent data, but editing rules mainly relies on the definition by professional staff at present. To achieve data cleaning automatically in the whole process, the techniques for discovering data rules become a hot research topic in recent years. The algorithms for mining CFDs mainly involve CFDMiner, CTANE, FastCFD. Based on the above techniques, we provide a mining algorithm for editing rule, which is based on sample inputs and master data under the extension definition of CFD and the definition of edit rules. The main ideas is as below: Mining CFD from sample inputs firstly; then according to the domain similarity between input samples and master data, we can get the corresponding properties of input samples from the master data, forming editing rules with pattern group. The algorithm can effectively discover edit rules. And the mined edit rules can effectively repair the data in accordance with the semantic of the rules.