Cross-domain object detection is a new research direction, which aims to solve the problem of generalization from training set to test set. In the existing methods, using image style transfer and train the model on the converted data set is an effective method. However, this method has the problems of not end-to-end training, low efficiency, and tedious process. Therefore, we propose a new cross domain target detection algorithm based on image style migration, which can combine image style migration and target detection to carry out end-to-end training, and greatly simplify the training process. The results on several common datasets show the validity of the model.
[1] Li PL, Chen XZ, Shen SJ. Stereo R-CNN based 3D object detection for autonomous driving. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 7636–7644.
[2] Hattori H, Lee N, Boddeti VN, et al. Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance: Can we learn pedestrian detectors and pose estimators without real data? International Journal of Computer Vision, 2018, 126(9): 1027–1044. [doi: 10.1007/s11263-018-1077-3
[3] Turk MA, Pentland AP. Face recognition using eigenfaces. Proceedings of 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Maui, HI, USA. 1991. 586–591.
[4] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2015. 91–99.
[5] Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440–1448.
[6] Cai ZW, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 6154–6162.
[7] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 6517–6525.
[8] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018.
[9] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands. 2016. 21–37.
[10] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA. 2009. 248–255.
[11] Everingham M, Eslami SMA, Van Gool L, et al. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 2015, 111(1): 98–136. [doi: 10.1007/s11263-014-0733-5
[12] Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context. Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland. 2014. 740–755.
[13] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation. Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France. 2015. 1180–1189.
[14] Inoue N, Furuta R, Yamasaki T, et al. Cross-domain weakly-supervised object detection through progressive domain adaptation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 5001–5009.
[15] Kim T, Jeong M, Kim S, et al. Diversify and match: A domain adaptive representation learning paradigm for object detection. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach. CA, USA. 2019. 12448–12457.
[16] Chen HY, Fang IS, Cheng CM, et al. Self-contained stylization via steganography for reverse and serial style transfer. Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass Village, FL, USA. 2020. 2152–2160.
[17] Chen YH, Li W, Sakaridis C, et al. Domain adaptive faster R-CNN for object detection in the wild. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 3339–3348.
[18] Zhu JY, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 2242–2251.
[19] He ZW, Zhang L. Multi-adversarial faster-RCNN for unrestricted object detection. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 6667–6676.
[20] Wang T, Zhang XP, Yuan L, et al. Few-shot adaptive faster R-CNN. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 7166–7175.
[21] Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2014. 2672–2680.
[22] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014.
[23] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434, 2015.
[24] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein GANs. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA. 2017. 5769–5779.
[25] Isola P, Zhu JY, Zhou TH, et al. Image-to-image translation with conditional adversarial networks. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 5967–5976.
[26] Wang TC, Liu MY, Zhu JY, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 8798–8807.
[27] Wang TC, Liu MY, Zhu JY, et al. Video-to-video synthesis. Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2018. 1152–1164.
[28] Choi Y, Choi M, Kim M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 8789–8797.
[29] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778.
[30] Torralba A, Murphy KP, Freeman WT, et al Context-based vision system for place and object recognition. Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France. 2003. 273–280.
[31] Johnson-Roberson M, Barto C, Mehta R, et al. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore. 2017. 746–753.
[32] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 3213–3223.
[33] Sakaridis C, Dai DX, Van Gool L. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 2018, 126(9): 973–992. [doi: 10.1007/s11263-018-1072-8
[34] Kim S, Choi J, Kim T, et al. Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 6091–6100.
[35] Saito K, Ushiku Y, Harada T, et al. Strong-weak distribution alignment for adaptive object detection. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 6956–6965.
[36] Zhu XG, Pang JM, Yang CY, et al. Adapting object detectors via selective cross-domain alignment. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA. 2019. 687–696.