本文已被:浏览 512次 下载 1174次
Received:January 07, 2023 Revised:March 01, 2023
Received:January 07, 2023 Revised:March 01, 2023
中文摘要: 开源数据集加速了深度学习的发展, 但存在许多不合理使用数据集的现象. 为保护数据集的知识产权, 近期工作提出数据集水印算法, 在数据集发布前预先植入水印, 当模型在此数据集上训练时该水印会被附着在模型中, 之后通过验证可疑模型是否存在水印来追溯数据集的非法使用. 但已有数据集水印算法无法在小扰动下提供有效并且隐蔽的黑盒水印验证. 为解决这一问题, 本文首次提出利用独立于图像内容与标签的风格属性来植入水印, 并限制对原数据集的扰动不涉及标签的修改. 通过不引入图像内容与标签的不一致性和额外的代理模型保证水印隐蔽性和有效性. 在水印验证阶段仅使用可疑模型的预测结果通过假设检验给出判断. 本文在CIFAR-10数据集上与现有5种方法相比较, 实验结果验证了本文提出的基于风格的数据集水印算法的有效性与功能不变性. 此外, 本文开展的消融实验验证了本文所提的风格优化模块的必要性, 算法在不同超参设定以及不同数据集下的有效性.
Abstract:Open-sourced datasets accelerate the development of deep learning, while unauthorized data usage frequently happens. To protect the dataset copyright, this study proposes the dataset watermarking algorithm. The watermark is embedded into the dataset before it is released. When the model is trained on this dataset, the watermark is attached to the model, which allows illegal dataset usage to be traced by verifying whether the watermark exists in a suspect model. However, existing dataset watermarking algorithms cannot provide effective and covert black-box verification under small perturbations. Given this problem, the method of embedding the watermark by a style attribute independent of the image content and label is proposed for the first time in this study, and the perturbation on the original dataset is constrained to avoid the modification of labels. The covertness and validity of the watermark are ensured without introducing the inconsistency between the image content and label or extra surrogate model. In the watermark verification stage, only the prediction results of the suspected model are applied to give the judgment via a hypothesis test. The proposed method is compared with the existing five methods on the CIFAR-10 dataset. The experimental results validate the effectiveness and fidelity of the proposed algorithm. Besides, the ablation experiments conducted in this study verify the necessity of the proposed style refinement module and the effectiveness of the proposed algorithm under various hyper-parameter settings and datasets.
keywords: dataset watermarking|dataset copyright protection|image style|style transfer|hypothesis test
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
盛钡娜,潘旭东,张谧.基于风格的数据集水印算法.计算机系统应用,2023,32(8):140-150
SHENG Bei-Na,PAN Xu-Dong,ZHANG Mi.Style-based Dataset Watermarking Algorithm.COMPUTER SYSTEMS APPLICATIONS,2023,32(8):140-150
盛钡娜,潘旭东,张谧.基于风格的数据集水印算法.计算机系统应用,2023,32(8):140-150
SHENG Bei-Na,PAN Xu-Dong,ZHANG Mi.Style-based Dataset Watermarking Algorithm.COMPUTER SYSTEMS APPLICATIONS,2023,32(8):140-150