Abstract:In the field of visual tracking, most deep learning-based trackers overemphasize accuracy while overlooking efficiency, thereby hindering their deployment on mobile platforms such as drones. In this study, a deep cross guidance Siamese network (SiamDCG) is put forward. To better deploy on edge computing devices, a unique backbone structure based on MobileNetV3-small is devised. Given the complexity of drone scenarios, the traditional method of regressing target boxes using Dirac δ distribution has significant drawbacks. To overcome the blurring effects inherent in bounding boxes, the regression branch is converted into predicting offset distribution, and the learned distribution is used to guide classification accuracy. Excellent performances on multiple aerial tracking benchmarks demonstrate the proposed approach’s robustness and efficiency. On an Intel i5 12th generation CPU, SiamDCG runs 167 times faster than SiamRPN++, while using 98 times fewer parameters and 410 times fewer FLOPs.