基于图像风格迁移的端到端跨域目标检测

doi:10.15888/j.cnki.csa.007756

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月7日 20:24 星期一

首页 > 过刊浏览>2021年第30卷第1期 >194-199. DOI:10.15888/j.cnki.csa.007756

PDF HTML阅读 XML下载导出引用引用提醒

基于图像风格迁移的端到端跨域目标检测
DOI:
                        10.15888/j.cnki.csa.007756
                    
CSTR:
                        
                    
作者:
                        吴泽远吴泽远
中国科学技术大学 信息科学与技术学院, 合肥 230026
在期刊界中查找
在百度中查找
在本站中查找
朱明朱明
中国科学技术大学 信息科学与技术学院, 合肥 230026
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:安徽省重点研发计划(201904a05020035)

End-to-End Cross-Domain Object Detection Based on Image Style Transfer

Author:

WU Ze-Yuan
WU Ze-Yuan
School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Ming
ZHU Ming
School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [36]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

跨域目标检测是最近兴起的研究方向, 旨在解决训练集到测试集的泛化问题. 在已有的方法中利用图像风格转换并在转换后的数据集上训练模型是一个有效的方法, 然而这一方法存在不能端到端训练的问题, 效率低, 流程繁琐. 为此, 我们提出一种新的基于图像风格迁移的跨域目标检测算法, 可以把图像风格迁移和目标检测结合在一起, 进行端到端训练, 大大简化训练流程, 在几个常见数据集上的结果证明了该模型的有效性.

关键词:跨域;目标检测;风格迁移;端到端

Abstract:

Cross-domain object detection is a new research direction, which aims to solve the problem of generalization from training set to test set. In the existing methods, using image style transfer and train the model on the converted data set is an effective method. However, this method has the problems of not end-to-end training, low efficiency, and tedious process. Therefore, we propose a new cross domain target detection algorithm based on image style migration, which can combine image style migration and target detection to carry out end-to-end training, and greatly simplify the training process. The results on several common datasets show the validity of the model.

Key words:cross-domain;object detection;style transfer;end-to-end

参考文献

[1] Li PL, Chen XZ, Shen SJ. Stereo R-CNN based 3D object detection for autonomous driving. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 7636–7644.

[2] Hattori H, Lee N, Boddeti VN, et al. Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance: Can we learn pedestrian detectors and pose estimators without real data? International Journal of Computer Vision, 2018, 126(9): 1027–1044. [doi: 10.1007/s11263-018-1077-3

[3] Turk MA, Pentland AP. Face recognition using eigenfaces. Proceedings of 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Maui, HI, USA. 1991. 586–591.

[4] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2015. 91–99.

[5] Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440–1448.

[6] Cai ZW, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 6154–6162.

[7] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 6517–6525.

[8] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018.

[9] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands. 2016. 21–37.

[10] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA. 2009. 248–255.

[11] Everingham M, Eslami SMA, Van Gool L, et al. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 2015, 111(1): 98–136. [doi: 10.1007/s11263-014-0733-5

[12] Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context. Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland. 2014. 740–755.

[13] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France. 2015. 1180–1189.

[14] Inoue N, Furuta R, Yamasaki T, et al. Cross-domain weakly-supervised object detection through progressive domain adaptation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 5001–5009.

[15] Kim T, Jeong M, Kim S, et al. Diversify and match: A domain adaptive representation learning paradigm for object detection. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach. CA, USA. 2019. 12448–12457.

[16] Chen HY, Fang IS, Cheng CM, et al. Self-contained stylization via steganography for reverse and serial style transfer. Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass Village, FL, USA. 2020. 2152–2160.

[17] Chen YH, Li W, Sakaridis C, et al. Domain adaptive faster R-CNN for object detection in the wild. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 3339–3348.

[18] Zhu JY, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 2242–2251.

[19] He ZW, Zhang L. Multi-adversarial faster-RCNN for unrestricted object detection. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 6667–6676.

[20] Wang T, Zhang XP, Yuan L, et al. Few-shot adaptive faster R-CNN. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 7166–7175.

[21] Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2014. 2672–2680.

[22] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014.

[23] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434, 2015.

[24] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein GANs. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA. 2017. 5769–5779.

[25] Isola P, Zhu JY, Zhou TH, et al. Image-to-image translation with conditional adversarial networks. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 5967–5976.

[26] Wang TC, Liu MY, Zhu JY, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 8798–8807.

[27] Wang TC, Liu MY, Zhu JY, et al. Video-to-video synthesis. Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2018. 1152–1164.

[28] Choi Y, Choi M, Kim M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 8789–8797.

[29] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778.

[30] Torralba A, Murphy KP, Freeman WT, et al Context-based vision system for place and object recognition. Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France. 2003. 273–280.

[31] Johnson-Roberson M, Barto C, Mehta R, et al. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore. 2017. 746–753.

[32] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 3213–3223.

[33] Sakaridis C, Dai DX, Van Gool L. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 2018, 126(9): 973–992. [doi: 10.1007/s11263-018-1072-8

[34] Kim S, Choi J, Kim T, et al. Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 6091–6100.

[35] Saito K, Ushiku Y, Harada T, et al. Strong-weak distribution alignment for adaptive object detection. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 6956–6965.

[36] Zhu XG, Pang JM, Yang CY, et al. Adapting object detectors via selective cross-domain alignment. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 687–696.

引用本文

吴泽远,朱明.基于图像风格迁移的端到端跨域目标检测.计算机系统应用,2021,30(1):194-199

复制

文章指标

点击次数:1258
下载次数: 2644
HTML阅读次数: 3031
引用次数: 0

历史

收稿日期:2020-06-06
最后修改日期:2020-07-07
录用日期:
在线发布日期: 2020-12-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码