Abstract:At present, the yak breeding method in the Qinghai-Tibet Plateau region of China is mainly based on traditional manual grazing. To solve the problem that human breeding methods cannot quickly track and count the number of yaks, an improved YOLOv5 and Bytetrack yak tracking method is proposed in this study to achieve the fast detection and tracking of yaks under video input. The YOLOv5 object detection network based on deep learning, combined with optimization methods such as coordinate attention, cross-scale feature fusion, and atrous spatial pyramid pooling pyramid, is adopted to reduce the difficulty of detection and misdetection caused by occlusion in yak detection, so as to accurately detect yak targets in videos. The Bytetrack tracker is used to implement the inter-frame object association through Kalman filtering and Hungarian algorithm, and the IDs are matched to the targets. The model is trained by using part of the yak data in ImageNet Dataset and yak sample images collected from the Yushu region of Qinghai. The experimental results show that the average detection accuracy of the improved model proposed in this study is 98.7%, which is 1.1, 1.89, 8.33, and 0.4 percentage points higher than the original YOLOv5s, SSD, YOLOX, and Faster RCNN models, respectively. It can converge quickly and has the best detection performance. The improved YOLOv5s and Bytetrack tracking results are the best, with MOTA increased by 7.1646%. The improved model developed in this study can detect and track yaks more quickly and accurately, providing technical support for the intelligent development of animal husbandry in the Qinghai region.