Vision tracking is one of the core functions of smart robots, and widely used in automatic driving, intelligent pension and other fields. The low-cost Raspberry Pi is employed as the slave computer robot platform. The object detection and visual tracking of human hands is implemented through running the pre-trained deep learning SSD model on host computer. The SSD model is trained based on Google’s TensorFlow deep learning framework and US Indiana University’s EgoHands dataset. Both of the robot and host computer’s software is written by Python in Linux systems. Video stream and tracking control commands are exchanged between robot and host via WiFi. The practical tests show that the vision tracking function of the developed smart robot has good stability and performance.