Abstract:With the development of cloud computing, the cloud storage technology gets a large variety of different types of network storage devices together to work collaboratively by clustering applications, virtualization, Distributed File System, alleviating the pressure of old data center storage. Besides, Data De-duplication is a technology that reduces storage space and lowers the network transmission. And it is going to be adaptable for cloud storage system one day. The combination of these two technologies will bring real benefits to IT storage industry. The paper has designed a de-duplication architecture based on cloud storage, proposed a scheme which runs at the client with In-line manner to eliminate duplicated data in chunk level, and then put those data into cloud. Under this architecture, HDFS stores the mass data while HBase stores hash value of data block.