In this article, we will be implementing and comparing algorithms for object tracking in OpenCV Python library. We will first understand what is object tracking and then see code examples of few object tracking modules of OpenCV python like KCF, CSRT, Mean Shift, and Cam Shift algorithms.
What is Object Tracking?
Object tracking is a computer vision task that refers to the process of finding & tracking the position of a predefined object that is moving in the frames of a video.
Object Tracking vs Object Detection
At times beginners confuse object tracking with object detection and use the two words interchangeably. But there is a slight difference between the two –
In the object detection task, we identify the object in a specific frame or a scene that may be just a static image. Whereas in object tracking we track the object which is in continuous motion in a video. In fact, if we perform object detection on every frame of the video its resulting effect is of object tracking only.
Applications of Object Tracking
Object tracking has many interesting and useful applications, some of which are given below –
- Human-computer interaction
- Security and surveillance
- Augmented reality
- Traffic control
- Medical imaging
- Video editing
Types of Object Tracking Algorithms
i) Single Object Tracking
Single object tracking refers to the process of selecting a region of interest (in the initial frame of a video) and tracking the position (i.e. coordinates) of the object in the upcoming frames of the video. We will be covering some of the algorithms used for single object tracking in this article.
ii) Multiple Object Tracking (MOT)
Multiple object tracking is the task of tracking more than one object in the video. In this case, the algorithm assigns a unique variable to each of the objects that are detected in the video frame. Subsequently, it identifies and tracks all these multiple objects in consecutive/upcoming frames of the video.
Since a video may have a large number of objects, or the video itself may be unclear, and there can be ambiguity in direction of the object’s motion Multiple Object Tracking is a difficult task and it thus relies on single frame object detection.
Installing the libraries
i) Installing OpenCV
We install the opencv-contrib-python library for our purpose. It is a different community maintained OpenCV Python package that contains some extra features and implementation than the regular OpenCV Python package.
pip install opencv-contrib-python
ii) Installing Numpy
Numpy is an important pre requisite for any computer vision task and it can be installed like below.
pip install numpy
iii) Importing the libraries
Let us import these libraries as show below.
import cv2 import numpy as np
i) KCF Object Tracking
KCF stands for Kernelized Correlation Filter, it is is a combination of techniques of two tracking algorithms (BOOSTING and MIL tracker). It is supposed to translate the bounding box (position of the object) using circular shift. In simple words, the KCF tracker focuses on the direction of change in an image(could be motion, extension or, orientation) and tries to generate the probabilistic position of the object that is to be tracked.
KCF Object Tracking in OpenCV Python
The KCF object tracking is implemented in the TrackerKCF_create() module of OpenCV python. Below is the code along with the explanation.
tracker = cv2.TrackerKCF_create() video = cv2.VideoCapture('video.mp4') ok,frame=video.read() bbox = cv2.selectROI(frame) ok = tracker.init(frame,bbox) while True: ok,frame=video.read() if not ok: break ok,bbox=tracker.update(frame) if ok: (x,y,w,h)=[int(v) for v in bbox] cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2,1) else: cv2.putText(frame,'Error',(100,0),cv2.FONT_HERSHEY_SIMPLEX,1,(0,0,255),2) cv2.imshow('Tracking',frame) if cv2.waitKey(1) & 0XFF==27: break cv2.destroyAllWindows()
Line 1-3: We first initialize the ‘KCF’ tracker object. Next, we initialize the video and then use the ‘read()’ function to fetch the first frame of the video.
Line 5: We initialize the ‘selectROI’ function with the first frame of the video which we fetched on the second line and store its value in the ‘bbox’ variable.
Line 7: We initialize the tracker (using ‘init’) with the frame (in which we selected our region of interest) and position (bbox) of the object to be tracked.
Line 9: Initialize a while loop that loops through the frames of our video.
Line 10: Use the ‘read()’ function on the video object to fetch the frames of the video along with a flag parameter(‘ok’) which informs if the frame fetching process was successful or not.
Line 11-12: If the flag parameter is false the execution stops i.e. if the video is not fetched properly the execution is stopped.
Line 13: We use the tracker ‘update’ function to pass a new consecutive frame with every iteration of the loop. It returns two variables, first is a flag parameter that informs if the tracking process was successful or not and the second returns the position of the tracked object in the frame if and only if the first parameter was true.
Line 14-16: If the ‘ok’ flag is true this block is executed. We fetched the position of the object in the ‘bbox’ variable, here we initialize the x,y coordinates and the values of width and height. Next, we use the OpenCV ‘rectangle’ function to put a bounding box around the detected object in consecutive frames of the video.
Line 17-18: If the tracker is unable to track the selected ROI or faces any errors, this block of code prints ‘Error’ on the video frames.
Line 19: Showing the video frames on a separate window using the ‘cv2.imshow’ function.
Line 20-21: If the user clicks the ‘escape’ button execution stops.
Line 22: Use the OpenCV ‘destroyAllWindows()’ function to close all lingering windows if there are any.
ii) CSRT Object Tracking
CSRT is the OpenCV implementation of the CSR-DCF (Channel and Spatial Reliability of Discriminative Correlation Filter) it is an advanced algorithm that accommodates changes like enlarging and non-rectangular objects. Essentially it uses HoG features along with SRM(spatial reliability maps) for object localization and tracking.
CSRT Object Tracking in OpenCV Python
The CSRT object tracking is implemented in the TrackerCSRT_create() module of OpenCV python. It can be used with videos similar to the previous section. Just change the tracker variable to the CSRT one and you will be good to go.
tracker = cv2.TrackerCSRT_create() video = cv2.VideoCapture('video.mp4') ok,frame=video.read() bbox = cv2.selectROI(frame) ok = tracker.init(frame,bbox) while True: ok,frame=video.read() if not ok: break ok,bbox=tracker.update(frame) if ok: (x,y,w,h)=[int(v) for v in bbox] cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2,1) else: cv2.putText(frame,'Error',(100,0),cv2.FONT_HERSHEY_SIMPLEX,1,(0,0,255),2) cv2.imshow('Tracking',frame) if cv2.waitKey(1) & 0XFF==27: break cv2.destroyAllWindows()
Other Object Tracking Algorithms in OpenCV
The OpenCV object tracking API provides a variety of trackers. You can try some of the other tracking algorithms by simply changing the value of the tracker variable.
tracker = cv2.TrackerGOTURN_create()
tracker = cv2.TrackerMIL_create()
iv) Histogram Density Algorithms for Object Tracking
i) Object tracking using Mean Shift algorithm
Mean Shift is a object tracking algorithm that uses the logic of pixel density in different images/histograms to track objects. It finds the closest cluster for a pixel point and iteratively moves toward it until a cluster center is reached or the error is below a threshold value. It basically means that it runs itself again and again (comparing pixel values to find a match) on an image until the object we are tracking is found (cluster). If the exact image cannot be found then an area with the maximum match is selected.
Mean Shift Object Tracker Implementation in OpenCV Python
cap = cv2.VideoCapture('video.mp4') ret,frame=cap.read() x,y,w,h = cv2.selectROI(frame) track_window = (x, y, w, h) roi = frame[y:y+h, x:x+w] hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.))) roi_hist = cv2.calcHist([hsv_roi],,mask,,[0,180]) cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX) term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) while(1): ret, frame = cap.read() if ret == True: hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) dst = cv2.calcBackProject([hsv],,roi_hist,[0,180],1) ret, track_window = cv2.meanShift(dst, track_window, term_crit) x,y,w,h = track_window img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2) cv2.imshow('img2',img2) k = cv2.waitKey(30) & 0xff if k == 27: break else: break cv2.destroyAllWindows()
Line 1-3: Load the video (on which tracking is to be performed) and fetch the first frame in the ‘frame’ variable. Finally, use the ‘selectROI’ function with the first frame and store the position of the object in the ‘bbox’ variable. This actually enables us to select our ROI manually instead of hard-coding it.
Line 4-5: Here we set up the initial location of the tracking window by using the values of the ROI that the user provided us. We initialize the ‘roi’ variable which holds the part of the image to be tracked.
Line 6-9: First we change the colorspace of our ROI to ‘HSV’ colorspace and then define a mask variable that ranges from max to min values(of pixels) present in the ‘roi’ variable. We perform histogram equalization on the ‘roi’ and lastly, we normalize the pixel values.
Line 10: We define a termination criterion (which is passed as an argument to the ‘meanShift’ algorithm function) an integer that defines the no of iterations(10) and another integer which defines how many units to move our computation square(1) as shown in the gif.
Line 11: Run a while loop to loop through the video.
Line 12: Use the ‘cv2’ ‘read()’ function to fetch the consecutive frames.
Line 13: If the flag variable ‘ret’ is true execute the code inside the ‘if’ statement.
Line 14-15: As we did in line 5 we need to change the colorspace of each and every consecutive frame that is being sent for tracking thus we apply the process again. Next, as we discussed in the definition the algorithm uses back-projection so we use the ‘calcBackProject()’ function.
Line 16: We call the ‘meanShift’ function that takes as arguments the image in which we need to detect and track the object, the termination criteria, and the position coordinates of the object to be detected. It returns the value of the coordinates of a rectangle which can be used as points for a bounding box parameter(‘ret’).
Line 17-19: First we update the positions of the object that is being tracked and store them(in x,y,w,h). Next, put a bounding box around the object(using the ‘rectangle’ function) and finally show the image(using ‘imshow’).
Line 20-22: This code block makes sure that if the user clicks the ‘escape’ button the execution stops.
Line 21-24: If the flag variable ‘ret’ is false the execution flow breaks out of the loop.
Line 25: Closes all windows.
ii) Object Tracking using Cam Shift algorithm
The problem with the mean shift algorithm is that the size of the bounding box always remains the same(even when the object starts to approach the camera and ultimately increases in size or vice versa). Continuously adaptive mean shift or CAM algorithm solves this problem for us (It applies mean shift on the previous window and new scaled search window). It is supposed to update the size of the window as
and finds the best ellipse that fits our object and returns its minor and major axis.
Cam Shift Object Tracker Implementation in OpenCV Python
cap = cv2.VideoCapture('video.mp4') ret,frame=cap.read() x,y,w,h = cv2.selectROI(frame) track_window = (x, y, w, h) roi = frame[y:y+h, x:x+w] hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.))) roi_hist = cv2.calcHist([hsv_roi],,mask,,[0,180]) cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX) term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) while(1): ret, frame = cap.read() if ret == True: hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) dst = cv2.calcBackProject([hsv],,roi_hist,[0,180],1) ret, track_window = cv2.CamShift(dst, track_window, term_crit) pts = cv2.boxPoints(ret) pts=np.int0(pts) img2 = cv2.polylines(frame, [pts], True, 255,2) cv2.imshow('img2',img2) k = cv2.waitKey(30) & 0xff if k == 27: break else: break cv2.destroyAllWindows()
This implementation of Cam Shift is similar to Mean Shift implementation. Here we just change the ‘meanShift’ function into the ‘CamShift’ function.
Line 16-19: This time since the coordinates need not be a perfect rectangle thus we use the first var ‘ret’. We need to typecast it into ‘integer’ in order to be sent to the ‘polylines’ function.