﻿ 联合YOLO和Camshift的目标跟踪算法研究
 计算机系统应用  2019, Vol. 28 Issue (9): 271-277 PDF

Research on Target Tracking Algorithm Based on YOLO and Camshift
HAN Peng, SHEN Jian-Xin, JIANG Jun-Jia, ZHOU Zhe
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Foundation item: Graduate Innovation Program of Jiangsu Province (KYCX18_0317)
Abstract: In order to solve the problem that traditional target tracking cannot be accurately tracked after occlusion, a target tracking algorithm combining YOLO and Camshift algorithm is proposed. Building a model of target detection using YOLO network structure, before the model is constructed, the image frame is preprocessed by image enhancement method, while maintaining sufficient image information in the video frame, improving the image quality and reducing the time complexity of the YOLO algorithm. The target is determined by the YOLO algorithm, and the initialization of the target tracking is completed. According to the position information of the target, the Camshift algorithm is used to process the subsequent video frames, and the target of each frame is updated, so that the position of the search window can be continuously adjusted to adapt to the movement of the target. The experimental results show that the proposed method can effectively overcome the problem of tracking loss after the target is occluded, and has good robustness.
Key words: YOLO algorithm     Camshift algorithm     image enhancement     target tracking     occlusion

1 视频帧预处理及检测算法

1.1 图像增强

Retinex算法将图像看作是由入射图像和反射图像共同构成的, 即:

 $S(x,y) = R(x,y) \times L(x,y)$ (1)

 $\log S = \log \left( {R \times L} \right) = \log R + \log L$ (2)

$s = \log S,r = \log R,l = \log L$ , 即:

 $r = s - l$ (3)

 $r\left( {x,y} \right) = \log S\left( {x,y} \right) - \log \left[ {F\left( {x,y} \right)*S\left( {x,y} \right)} \right]$ (4)

 $F\left( {x,y} \right) = K{e^{ - \left( {{x^2} + {y^2}} \right)/{c^2}}}$ (5)
 $\iint {F\left( {x,y} \right)dxdy} = 1$ (6)

 图 1 图像增强前后对比图

1.2 YOLO算法目标检测

YOLO检测算法将目标检测看作一个单一的回归问题, 直接在图像中找到目标的边界框. 将图像输入卷积神经网络进行特征提取. 不同卷积层的特征图如图2所示.

 图 2 不同卷积层特征图

 图 3 YOLO检测框架图

YOLO算法将图像分为S×S的网格, 如图4所示, 如果目标的中心落入其中的一个单元格中, 该单元格负责检测目标.

 图 4 YOLO检测步骤

 $Confidence = Pr(Object) \times IOU_{pred}^{truth}$ (7)

 $IOU_{pred}^{truth} = \frac{{(box(pred) \cap box(truth))}}{{(box(pred) \cup box(truth))}}$ (8)

YOLO算法的网络结构图如图5所示, 网络中有24个卷积层和两个全连接层, 网络的设计借鉴了GoogleNet的思想, 在每个1×1的降维层之后再连接一个3×3的卷积层, 来代替Inception结构[11]. 由于在目标跟踪时, 只需要判断是前景还是背景, 不需要进行目标类别的判断, 所以可以将全连接层去掉, 用Softmax[12]分类器进行简化, 如图6所示, 将最后一层卷积层的输出作为Softmax分类器的输入, 将图像检测分为前景和背景, 将检测为前景的目标作为候选区域, 为下面的目标跟踪做准备.

 图 5 YOLO算法网络结构图

 图 6 简化后YOLO算法网络结构图

2 联合跟踪算法

2.1 Camshift跟踪算法

Camshift算法, 又称连续自适应均值漂移(Meanshift)算法. 它的基本思想是在视频图像上的每一帧都做Meanshift算法, 将上一帧的结果(跟踪窗口的中心和大小)作为当前帧的初始值, 依次这样迭代下去, 达到目标跟踪的效果.

Camshift算法如图7所示.

 图 7 Camshift算法流程图

Camshift算法进行目标跟踪, 首先提出用HSV色彩空间中色度分量的直方图的反向概率投影来作匹配, 具体分为下面几个步骤: (1)确定初始目标和区域; (2)计算出目标区域的色度分量直方图; (3)利用直方图计算输入图像的反向投影图; (4)利用Meanshift算法在反向投影图上进行迭代, 直到其达到最大迭代次数或者收敛, 保存窗口的零阶矩、一阶矩和二阶矩; (5)从第四步中计算出新的窗口中心和大小, 以此为初始值, 进行下一帧的目标跟踪(即跳转至第二步).

Camshift算法在Meanshift算法的基础上, 主要是能根据中心位置和窗口大小进行更新, 达到跟踪窗口自适应的目的, 具体流程如下:

1) 通过反向投影图计算初始跟踪窗口的颜色概率分布I(x,y);

2) 计算跟踪窗口的零阶矩、一阶矩和二阶矩:

 ${M_{00}} = \sum\limits_x {\sum\limits_y {I(x,y)}}$ (9)
 ${M_{10}} = \sum\limits_x {\sum\limits_y {xI(x,y)}}$ (10)
 ${M_{10}} = \sum\limits_x {\sum\limits_y {xI(x,y)}}$ (11)
 ${M_{20}} = \sum\limits_x {\sum\limits_y {{x^2}I(x,y)}}$ (12)
 ${M_{02}} = \sum\limits_x {\sum\limits_y {{y^2}I(x,y)}}$ (13)
 ${M_{11}} = \sum\limits_x {\sum\limits_y {xyI(x,y)}}$ (14)

3) 计算中心位置:

 ${x_c} = \frac{{{M_{10}}}}{{{M_{00}}}}$ (15)
 ${x_c} = \frac{{{M_{10}}}}{{{M_{00}}}}$ (16)

4) 计算跟踪窗口的长度和宽度:

 $l = 2\sqrt {\frac{{\left( {a + c} \right) + \sqrt {{b^2} + {{\left( {a - c} \right)}^2}} }}{2}}$ (17)
 $h = 2\sqrt {\frac{{\left( {a + c} \right) - \sqrt {{b^2} + {{\left( {a - c} \right)}^2}} }}{2}}$ (18)

 $a = \frac{{{M_{20}}}}{{{M_{00}}}} - x_c^2,b = 2\left( {\frac{{{M_{11}}}}{{{M_{00}}}} - {x_c}{y_c}} \right),c = \frac{{{M_{02}}}}{{{M_{00}}}} - {x_c}{y_c}$
2.2 结合目标区域像素值进行遮挡判断

 ${\alpha _t} = \frac{{{I_t}}}{{{I_0}}}$ (19)

2.3 算法步骤

1) 初始化. 利用YOLO算法初始化视频首帧;

2) Camshift跟踪. 利用Camshift算法跟踪目标, 同时判断目标是否发生遮挡现象;

① 有遮挡: 若 ${\alpha _t} < \beta$ ,其中 $\beta$ 为遮挡阈值, 则认为目标被遮挡. 在判断出目标被遮挡后, 考虑目标的速度不是突变的, 一般处于匀速运动或者匀加速运动, 利用遮挡前的n帧图像所跟踪的位置信息, 二次拟合出位置与帧数的变化关系, 利用这个关系进行遮挡时的位置预估; 在目标再次被检测出来时, 需要用YOLO算法进行目标更新.

② 无遮挡: 若 ${\alpha _t} \ge \beta$ , 则认为目标没有被遮挡, 用Camshift算法继续跟踪.

 图 8 联合YOLO和Camshift算法流程图

3 实验仿真对比

 图 9 跟踪界面

3.1 定性结果对比

 图 10 传统Camshift算法的跟踪效果

 图 11 KCF算法的跟踪效果

 图 12 联合YOLO和Camshift算法的跟踪效果

 图 13 传统Camshift算法的跟踪效果

 图 14 KCF算法的跟踪效果

 图 15 联合YOLO和Camshift算法的跟踪效果

KCF算法在实验1和实验2的效果分别如图11图14所示. 在遥控车靠近遮挡物时, 如图14(b)所示, 跟踪效果没有影响. 但当遥控车受到遮挡物干扰时, 如图14(c)所示, 跟踪效果明显下降, 丢失跟踪目标.

3.2 定性结果对比

 图 16 实验1下中心点X轴坐标的对比

 图 17 实验2下中心点X轴坐标的对比

4 结束语

 [1] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述. 自动化学报, 2017, 43(8): 1306-1318. [2] 葛宝义, 左宪章, 胡永江. 视觉目标跟踪方法研究综述. 中国图象图形学报, 2018, 23(8): 1091-1107. [3] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. [4] Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. [5] Ren SQ, He KM, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv: 1506. 01497, 2015. [6] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016. [7] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, et al, eds. Computer Vision-ECCV 2016. 2016. 21-37. [8] BRADSKI G R. Real time face and object tracking as a component of a perceptual user interface. Proceedings Fourth IEEE Workshop on Applications of Computer Vision. Princeton, NJ, USA, USA, 1998, 214-219. [9] 郝志成, 吴川, 杨航, 等. 基于双边纹理滤波的图像细节增强方法. 中国光学, 2016, 9(4): 423-431. [10] Land EH, McCann J. Lightness and retinex theory. Journal of the Optical Society of America, 1971, 61(1): 1-11. DOI:10.1364/JOSA.61.000001 [11] He KM, Zhang XY, REN SQ, et al. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770-778. [12] Yegnanarayana B. Artificial neural networks for pattern recognition. Sadhana, 1994, 19(2): 189-238. DOI:10.1007/BF02811896 [13] Henriques JF, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. DOI:10.1109/TPAMI.2014.2345390