###

计算机系统应用英文版:2023,32(8):140-150

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于风格的数据集水印算法

盛钡娜, 潘旭东, 张谧

(复旦大学计算机科学技术学院, 上海 200438)

Style-based Dataset Watermarking Algorithm

SHENG Bei-Na, PAN Xu-Dong, ZHANG Mi

(School of Computer Science, Fudan University, Shanghai 200438, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 565次下载 1331次
Received:January 07, 2023 Revised:March 01, 2023

中文摘要: 开源数据集加速了深度学习的发展, 但存在许多不合理使用数据集的现象. 为保护数据集的知识产权, 近期工作提出数据集水印算法, 在数据集发布前预先植入水印, 当模型在此数据集上训练时该水印会被附着在模型中, 之后通过验证可疑模型是否存在水印来追溯数据集的非法使用. 但已有数据集水印算法无法在小扰动下提供有效并且隐蔽的黑盒水印验证. 为解决这一问题, 本文首次提出利用独立于图像内容与标签的风格属性来植入水印, 并限制对原数据集的扰动不涉及标签的修改. 通过不引入图像内容与标签的不一致性和额外的代理模型保证水印隐蔽性和有效性. 在水印验证阶段仅使用可疑模型的预测结果通过假设检验给出判断. 本文在CIFAR-10数据集上与现有5种方法相比较, 实验结果验证了本文提出的基于风格的数据集水印算法的有效性与功能不变性. 此外, 本文开展的消融实验验证了本文所提的风格优化模块的必要性, 算法在不同超参设定以及不同数据集下的有效性.

中文关键词: 数据集水印|数据集知识产权保护|图像风格|风格迁移|假设检验

Abstract:Open-sourced datasets accelerate the development of deep learning, while unauthorized data usage frequently happens. To protect the dataset copyright, this study proposes the dataset watermarking algorithm. The watermark is embedded into the dataset before it is released. When the model is trained on this dataset, the watermark is attached to the model, which allows illegal dataset usage to be traced by verifying whether the watermark exists in a suspect model. However, existing dataset watermarking algorithms cannot provide effective and covert black-box verification under small perturbations. Given this problem, the method of embedding the watermark by a style attribute independent of the image content and label is proposed for the first time in this study, and the perturbation on the original dataset is constrained to avoid the modification of labels. The covertness and validity of the watermark are ensured without introducing the inconsistency between the image content and label or extra surrogate model. In the watermark verification stage, only the prediction results of the suspected model are applied to give the judgment via a hypothesis test. The proposed method is compared with the existing five methods on the CIFAR-10 dataset. The experimental results validate the effectiveness and fidelity of the proposed algorithm. Besides, the ablation experiments conducted in this study verify the necessity of the proposed style refinement module and the effectiveness of the proposed algorithm under various hyper-parameter settings and datasets.

keywords: dataset watermarking|dataset copyright protection|image style|style transfer|hypothesis test

文章编号： 中图分类号： 文献标志码：

基金项目:

Author Name	Affiliation	E-mail
SHENG Bei-Na	School of Computer Science, Fudan University, Shanghai 200438, China	bnsheng20@fudan.edu.cn
PAN Xu-Dong	School of Computer Science, Fudan University, Shanghai 200438, China
ZHANG Mi	School of Computer Science, Fudan University, Shanghai 200438, China

Author Name	Affiliation	E-mail
SHENG Bei-Na	School of Computer Science, Fudan University, Shanghai 200438, China	bnsheng20@fudan.edu.cn
PAN Xu-Dong	School of Computer Science, Fudan University, Shanghai 200438, China
ZHANG Mi	School of Computer Science, Fudan University, Shanghai 200438, China

引用文本：
盛钡娜,潘旭东,张谧.基于风格的数据集水印算法.计算机系统应用,2023,32(8):140-150
SHENG Bei-Na,PAN Xu-Dong,ZHANG Mi.Style-based Dataset Watermarking Algorithm.COMPUTER SYSTEMS APPLICATIONS,2023,32(8):140-150