基于孪生网络的高效无人机目标跟踪
作者:
基金项目:

国家自然科学基金面上项目(62271251)


Efficient Tracking for UAVs Based on Siamese Network
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [45]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    在视觉跟踪领域, 大多数基于深度学习的跟踪器过分地强调精度, 而忽视了算法速度. 因此, 这些算法在移动平台上的部署(无人机), 受到了阻碍. 在本文中, 提出了一种基于Siamese的深度交叉指导跟踪器(SiamDCG). 为了更好地在边缘计算设备上部署, 在MobileNetV3-small的基础上设计了独特的backbone结构. 此外, 针对无人机场景的复杂性, 传统使用狄拉克 δ分布预测目标框的方式有很大的弊端, 为了克服边界框存在的模糊效应, SiamDCG将回归框分支转为预测偏移量的分布, 并且用学习到的分布去指导分类的准确性. 在多个无人机benchmark上的优秀表现, 都显示了其鲁棒性与高效性. 在Intel i5 12代CPU上, SiamDCG运行速度是SiamRPN++的167倍, 使用的参数仅为它的1/98, FLOPs是1/410 .

    Abstract:

    In the field of visual tracking, most deep learning-based trackers overemphasize accuracy while overlooking efficiency, thereby hindering their deployment on mobile platforms such as drones. In this study, a deep cross guidance Siamese network (SiamDCG) is put forward. To better deploy on edge computing devices, a unique backbone structure based on MobileNetV3-small is devised. Given the complexity of drone scenarios, the traditional method of regressing target boxes using Dirac δ distribution has significant drawbacks. To overcome the blurring effects inherent in bounding boxes, the regression branch is converted into predicting offset distribution, and the learned distribution is used to guide classification accuracy. Excellent performances on multiple aerial tracking benchmarks demonstrate the proposed approach’s robustness and efficiency. On an Intel i5 12th generation CPU, SiamDCG runs 167 times faster than SiamRPN++, while using 98 times fewer parameters and 410 times fewer FLOPs.

    参考文献
    [1] Odelga M, Stegagno P, Kochanek N, et al. A self-contained teleoperated quadrotor: On-board state-estimation and indoor obstacle avoidance. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane: IEEE, 2018. 7840–7847.
    [2] Bonatti R, Ho C, Wang WS, et al. Towards a robust aerial cinematography platform: Localizing and tracking moving targets in unstructured environments. Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macao: IEEE, 2019. 229–236.
    [3] Cheng H, Lin LS, Zheng ZQ, et al. An autonomous vision-based target tracking system for rotorcraft unmanned aerial vehicles. Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 1732–1738.
    [4] Fu CH, Cao ZA, Li YM, et al. Siamese anchor proposal network for high-speed aerial tracking. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi’an: IEEE, 2021. 510–516.
    [5] Cao ZA, Fu CH, Ye JJ, et al. SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague: IEEE, 2021. 3086–3092.
    [6] Cao ZA, Huang ZY, Pan L, et al. TCTrack: Temporal contexts for aerial tracking. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14778–14788.
    [7] Li YM, Fu CH, Ding FQ, et al. AutoTrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 11920–11929.
    [8] Huang ZY, Fu CH, Li YM, et al. Learning aberrance repressed correlation filters for real-time UAV tracking. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 2891–2900.
    [9] Li F, Fu CH, Lin FL, et al. Training-set distillation for real-time UAV object tracking. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA). Paris: IEEE, 2020. 9715–9721.
    [10] Wang Q, Teng Z, Xing JL, et al. Learning attentions: Residual attentional Siamese network for high performance online visual tracking. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 4854–4863.
    [11] Guo Q, Feng W, Zhou C, et al. Learning dynamic Siamese network for visual object tracking. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 1781–1789.
    [12] Li Q, Qin ZK, Zhang WB, et al. Siamese keypoint prediction network for visual object tracking. arXiv:2006.04078, 2020.
    [13] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1314–1324.
    [14] Li SY, Yeung DY. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017. 4140–4146.
    [15] Li B, Wu W, Wang Q, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 4277–4286.
    [16] Xu YD, Wang ZY, Li ZX, et al. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 12549–12556.
    [17] Guo DY, Shao YY, Cui Y, et al. Graph attention tracking. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 9538–9547.
    [18] Hu WM, Wang Q, Zhang L, et al. SiamMask: A framework for fast online object tracking and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072–3089.
    [19] Zhu Z, Wang Q, Li B, et al. Distractor-aware Siamese networks for visual object tracking. Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018. 103–119.
    [20] Cao ZA, Fu CH, Ye JJ, et al. HiFT: Hierarchical feature transformer for aerial tracking. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 15437–15446.
    [21] Zhang ZP, Peng HW, Fu JL, et al. Ocean: Object-aware anchor-free tracking. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 771–787.
    [22] Bertinetto L, Valmadre J, Henriques JF, et al. Fully-convolutional Siamese networks for object tracking. Proceedings of the 2016 European Conference on Computer Vision. Amsterdam: Springer, 2016. 850–865.
    [23] Li B, Yan JJ, Wu W, et al. High performance visual tracking with Siamese region proposal network. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8971–8980.
    [24] Du SD, Wang SP. An overview of correlation-filter-based object tracking. IEEE Transactions on Computational Social Systems, 2022, 9(1): 18–31.
    [25] Chen ZD, Zhong BN, Li GR, et al. SiamBAN: Target-aware tracking with Siamese box adaptive network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5158–5173.
    [26] Tian Z, Shen CH, Chen H, et al. FCOS: Fully convolutional one-stage object detection. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 9626–9635.
    [27] Li X, Wang WH, Wu LJ, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 1763.
    [28] Zhang ZP, Peng HW. Deeper and wider Siamese networks for real-time visual tracking. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 4586–4595.
    [29] Zheng ZH, Wang P, Liu W, et al. Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 12993–13000.
    [30] Li X, Wang WH, Hu XL, et al. Generalized focal loss V2: Learning reliable localization quality estimation for dense object detection. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 11627–11636.
    [31] Lin TY, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. Proceedings of the 13th European Conference on Computer Vision. Zurich: Springer, 2014. 740–755.
    [32] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252.
    [33] Real E, Shlens J, Mazzocchi S, et al. YouTube-bounding boxes: A large high-precision human-annotated data set for object detection in video. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 7464–7473.
    [34] Fan H, Lin LT, Yang F, et al. LaSOT: A high-quality benchmark for large-scale single object tracking. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5369–5378.
    [35] Huang LH, Zhao X, Huang KQ. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562–1577.
    [36] Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 445–461.
    [37] Fu CH, Cao ZA, Li YM, et al. Onboard real-time aerial tracking with efficient Siamese anchor proposal network. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5606913.
    [38] Danelljan M, Robinson A, Khan FS, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 472–488.
    [39] Wang N, Song YB, Ma C, et al. Unsupervised deep tracking. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 1308–1317.
    [40] Li F, Yao YJ, Li PH, et al. Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Venice: IEEE, 2017. 2001–2009.
    [41] Danelljan M, Bhat G, Khan FS, et al. ECO: Efficient convolution operators for tracking. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 6931–6939.
    [42] Danelljan M, Häger G, Khan FS, et al. Learning spatially regularized correlation filters for visual tracking. Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015. 4310–4318.
    [43] Zhang L, Suganthan PN. Robust visual tracking via co-trained kernelized correlation filters. Pattern Recognition, 2017, 69: 82–93.
    [44] Galoogahi HK, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 1144–1152.
    [45] Fu CH, Lu KH, Zheng GZ, et al. Siamese object tracking for unmanned aerial vehicle: A review and comprehensive analysis. Artificial Intelligence Review, 2023, 56(1): 1417–1477.
    相似文献
    引证文献
引用本文

王建浩,叶明,姚佳烽.基于孪生网络的高效无人机目标跟踪.计算机系统应用,2025,34(1):69-79

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-17
  • 最后修改日期:2024-07-10
  • 在线发布日期: 2024-11-15
文章二维码
您是第10782479位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号