Simple object tracking with OpenCV

Today’s tutorial kicks off a new series of blog posts on object tracking, arguably one of the most requested topics here on PyImageSearch.

Object tracking is the process of:

  1. Taking an initial set of object detections (such as an input set of bounding box coordinates)
  2. Creating a unique ID for each of the initial detections
  3. And then tracking each of the objects as they move around frames in a video, maintaining the assignment of unique IDs

Furthermore, object tracking allows us to apply a unique ID to each tracked object, making it possible for us to count unique objects in a video. Object tracking is paramount to building a person counter (which we’ll do later in this series).

An ideal object tracking algorithm will:

  • Only require the object detection phase once (i.e., when the object is initially detected)
  • Will be extremely fast — much faster than running the actual object detector itself
  • Be able to handle when the tracked object “disappears” or moves outside the boundaries of the video frame
  • Be robust to occlusion
  • Be able to pick up objects it has “lost” in between frames

This is a tall order for any computer vision or image processing algorithm and there are a variety of tricks we can play to help improve our object trackers.

But before we can build such a robust method we first need to study the fundamentals of object tracking.

In today’s blog post, you will learn how to implement centroid tracking with OpenCV, an easy to understand, yet highly effective tracking algorithm.

In future posts in this object tracking series, I’ll start going into more advanced kernel-based and correlation-based tracking algorithms.

To learn how to get started building your first object tracking with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Simple object tracking with OpenCV

In the remainder of this post, we’ll be implementing a simple object tracking algorithm using the OpenCV library.

This object tracking algorithm is called centroid tracking as it relies on the Euclidean distance between (1) existing object centroids (i.e., objects the centroid tracker has already seen before) and (2) new object centroids between subsequent frames in a video.

We’ll review the centroid algorithm in more depth in the following section. From there we’ll implement a Python class to contain our centroid tracking algorithm and then create a Python script to actually run the object tracker and apply it to input videos.

Finally, we’ll run our object tracker and examine the results, noting both the positives and the drawbacks of the algorithm.

The centroid tracking algorithm

The centroid tracking algorithm is a multi-step process. We will review each of the tracking steps in this section.

Step #1: Accept bounding box coordinates and compute centroids

Figure 1: To build a simple object tracking algorithm using centroid tracking, the first step is to accept bounding box coordinates from an object detector and use them to compute centroids.

The centroid tracking algorithm assumes that we are passing in a set of bounding box (x, y)-coordinates for each detected object in every single frame.

These bounding boxes can be produced by any type of object detector you would like (color thresholding + contour extraction, Haar cascades, HOG + Linear SVM, SSDs, Faster R-CNNs, etc.), provided that they are computed for every frame in the video.

Once we have the bounding box coordinates we must compute the “centroid”, or more simply, the center (x, y)-coordinates of the bounding box. Figure 1 above demonstrates accepting a set of bounding box coordinates and computing the centroid.

Since these are the first initial set of bounding boxes presented to our algorithm we will assign them unique IDs.

Step #2: Compute Euclidean distance between new bounding boxes and existing objects

Figure 2: Three objects are present in this image for simple object tracking with Python and OpenCV. We need to compute the Euclidean distances between each pair of original centroids (red) and new centroids (green).

For every subsequent frame in our video stream we apply Step #1 of computing object centroids; however, instead of assigning a new unique ID to each detected object (which would defeat the purpose of object tracking), we first need to determine if we can associate the new object centroids (yellow) with the old object centroids (purple). To accomplish this process, we compute the Euclidean distance (highlighted with green arrows) between each pair of existing object centroids and input object centroids.

From Figure 2 you can see that we have this time detected three objects in our image. The two pairs that are close together are two existing objects.

We then compute the Euclidean distances between each pair of original centroids (yellow) and new centroids (purple). But how do we use the Euclidean distances between these points to actually match them and associate them?

The answer is in Step #3.

Step #3: Update (x, y)-coordinates of existing objects

Figure 3: Our simple centroid object tracking method has associated objects with minimized object distances. What do we do about the object in the bottom left though?

The primary assumption of the centroid tracking algorithm is that a given object will potentially move in between subsequent frames, but the distance between the centroids for frames F_t and F_{t + 1} will be smaller than all other distances between objects.

Therefore, if we choose to associate centroids with minimum distances between subsequent frames we can build our object tracker.

In Figure 3 you can see how our centroid tracker algorithm chooses to associate centroids that minimize their respective Euclidean distances.

But what about the lonely point in the bottom-left?

It didn’t get associated with anything — what do we do with it?

Step #4: Register new objects

Figure 4: In our object tracking with Python and OpenCV example, we have a new object that wasn’t matched with an existing object, so it is registered as object ID #3.

In the event that there are more input detections than existing objects being tracked, we need to register the new object. “Registering” simply means that we are adding the new object to our list of tracked objects by:

  1. Assigning it a new object ID
  2. Storing the centroid of the bounding box coordinates for that object

We can then go back to Step #2 and repeat the pipeline of steps for every frame in our video stream.

Figure 4 demonstrates the process of using the minimum Euclidean distances to associate existing object IDs and then registering a new object.

Step #5: Deregister old objects

Any reasonable object tracking algorithm needs to be able to handle when an object has been lost, disappeared, or left the field of view.

Exactly how you handle these situations is really dependent on where your object tracker is meant to be deployed, but for this implementation, we will deregister old objects when they cannot be matched to any existing objects for a total of N subsequent frames.

Object tracking project structure

To see today’s project structure in your terminal, simply use the tree  command:

Our pyimagesearch  module is not pip-installable — it is included with today’s “Downloads” (which you’ll find at the bottom of this post). Inside you’ll find the centroidtracker.py  file which contains the CentroidTracker  class.

The CentroidTracker  class is an important component used in the object_tracker.py  driver script.

The remaining .prototxt  and .caffemodel  files are part of the OpenCV deep learning face detector. They are necessary for today’s face detection + tracking method, but you could easily use another form of detection (more on that later).

Be sure that you have NumPy, SciPy, and imutils installed before you proceed:

…in addition to having OpenCV 3.3+ installed. If you follow one of my OpenCV install tutorials, be sure to replace the tail end of the wget  command to grab at least OpenCV 3.3 (and update the paths in the CMake command). You’ll need 3.3+ to ensure you have the DNN module.

Implementing centroid tracking with OpenCV

Before we can apply object tracking to our input video streams, we first need to implement the centroid tracking algorithm. While you’re digesting this centroid tracker script, just keep in mind Steps 1-5 above and review the steps as necessary.

As you’ll see, the translation of steps to code requires quite a bit of thought, and while we perform all steps, they aren’t linear due to the nature of our various data structures and code constructs.

I would suggest

  1. Reading the steps above
  2. Reading the code explanation for the centroid tracker
  3. And finally reading the steps above once more

This process will bring everything full circle and allow you to wrap your head around the algorithm.

Once you’re sure you understand the steps in the centroid tracking algorithm, open up the centroidtracker.py  inside the pyimagesearch  module and let’s review the code:

On Lines 2-4 we import our required packages and modules — distance , OrderedDict , and numpy .

Our CentroidTracker  class is defined on Line 6. The constructor accepts a single parameter, the maximum number of consecutive frames a given object has to be lost/disappeared for until we remove it from our tracker (Line 7).

Our constructor builds four class variables:

  • nextObjectID : A counter used to assign unique IDs to each object (Line 12). In the case that an object leaves the frame and does not come back for maxDisappeared  frames, a new (next) object ID would be assigned.
  • objects : A dictionary that utilizes the object ID as the key and the centroid (x, y)-coordinates as the value (Line 13).
  • disappeared : Maintains number of consecutive frames (value) a particular object ID (key) has been marked as “lost”for (Line 14).
  • maxDisappeared : The number of consecutive frames an object is allowed to be marked as “lost/disappeared” until we deregister the object.

Let’s define the register  method which is responsible for adding new objects to our tracker:

The register  method is defined on Line 21. It accepts a centroid  and then adds it to the objects  dictionary using the next available object ID.

The number of times an object has disappeared is initialized to 0  in the disappeared  dictionary (Line 25).

Finally, we increment the nextObjectID  so that if a new object comes into view, it will be associated with a unique ID (Line 26).

Similar to our register  method, we also need a deregister  method:

Just like we can add new objects to our tracker, we also need the ability to remove old ones that have been lost or disappeared from our the input frames themselves.

The deregister  method is defined on Line 28. It simply deletes the objectID  in both the objects  and disappeared  dictionaries, respectively (Lines 31 and 32).

The heart of our centroid tracker implementation lives inside the update  method:

The update method, defined on Line 34, accepts a list of bounding box rectangles, presumably from an object detector (Haar cascade, HOG + Linear SVM, SSD, Faster R-CNN, etc.). The format of the rects  parameter is assumed to be a tuple with this structure: (startX, startY, endX, endY) .

If there are no detections, we’ll loop over all object IDs and increment their disappeared  count (Lines 37-41). We’ll also check if we have reached the maximum number of consecutive frames a given object has been marked as missing. If that is the case we need to remove it from our tracking systems (Lines 46 and 47). Since there is no tracking info to update, we go ahead and return  early on Line 51.

Otherwise, we have quite a bit of work to do over the next seven code blocks in the update  method:

On Line 54 we’ll initialize a NumPy array to store the centroids for each rect .

Then, we loop over bounding box rectangles (Line 57) and compute the centroid and store it in the inputCentroids  list (Lines 59-61).

If there are currently no objects we are tracking, we’ll register each of the new objects:

Otherwise, we need to update any existing object (x, y)-coordinates based on the centroid location that minimizes the Euclidean distance between them:

The updates to existing tracked objects take place beginning at the else  on Line 72. The goal is to track the objects and to maintain correct object IDs — this process is accomplished by computing the Euclidean distances between all pairs of objectCentroids  and inputCentroids , followed by associating object IDs that minimize the Euclidean distance.

Inside of the else block beginning on Line 72, we will:

  • Grab objectIDs  and objectCentroid  values (Lines 74 and 75).
  • Compute the distance between each pair of existing object centroids and new input centroids (Line 81). The output NumPy array shape of our distance map D  will be (# of object centroids, # of input centroids) .
  • To perform the matching we must (1) Find the smallest value in each row, and (2) Sort the row indexes based on the minimum values (Line 88). We perform a very similar process on the columns, finding the smallest value in each column, and then sorting them based on the ordered rows (Line 93). Our goal is to have the index values with the smallest corresponding distance at the front of the lists.

The next step is to use the distances to see if we can associate object IDs:

Inside the code block above, we:

  • Initialize two sets to determine which row and column indexes we have already used (Lines 98 and 99). Keep in mind that a set is similar to a list but it contains only unique values.
  • Then we loop over the combinations of (row, col)  index tuples (Line 103) in order to update our object centroids:
    • If we’ve already used either this row or column index, ignore it and continue  to loop (Lines 107 and 108).
    • Otherwise, we have found an input centroid that:
      • 1. Has the smallest Euclidean distance to an existing centroid
      • 2. And has not been matched with any other object
      • In that case, we update the object centroid (Lines 113-115) and make sure to add the row  and col  to their respective usedRows  and usedCols  sets

There are likely indexes in our usedRows  + usedCols  sets that we have NOT examined yet:

So we must determine which centroid indexes we haven’t examined yet and store them in two new convenient sets ( unusedRows  and unusedCols ) on Lines 124 and 125.

Our final check handles any objects that have become lost or if they’ve potentially disappeared:

To finish up:

  • If the number of object centroids is greater than or equal to the number of input centroids (Line 131):
    • We need to verify if any of these objects are lost or have disappeared by looping over unused row indexes if any (Line 133).
    • In the loop, we will:
      • 1. Increment their disappeared  count in the dictionary (Line 137).
      • 2. Check if the disappeared  count exceeds the maxDisappeared  threshold (Line 142), and, if so we’ll deregister the object (Line 143).

Otherwise, the number of input centroids is greater than the number of existing object centroids, so we have new objects to register and track:

We loop over the unusedCols  indexes (Line 149) and we register each new centroid (Line 150). Finally, we’ll return the set of trackable objects to the calling method (Line 153).

Understanding the centroid tracking distance relationship

Our centroid tracking implementation was quite long, and admittedly, the most confusing aspect of the algorithm is Lines 81-93.

If you’re having trouble following along with what that code is doing you should consider opening a Python shell and performing the following experiment:

Once you’ve started a Python shell in your terminal with the python  command, import distance  and numpy  as shown on Lines 1 and 2).

Then, set a seed for reproducibility (Line 3) and generate 2 (random) existing objectCentroids  (Line 4) and 3 inputCentroids  (Line 5).

From there, compute the Euclidean distance between the pairs (Line 6) and display the results (Lines 7-9). The result is a matrix D  of distances with two rows (# of existing object centroids) and three columns (# of new input centroids).

Just like we did earlier in the script, let’s find the minimum distance in each row and sort the indexes based on this value:

First, we find the minimum value for each row, allowing us to figure out which existing object is closest to the new input centroid (Lines 10 and 11). By then sorting on these values (Line 12) we can obtain the indexes of these rows  (Lines 13 and 14).

In this case, the second row (index 1 ) has the smallest value and then the first row (index 0 ) has the next smallest value.

Using a similar process for the columns:

…we first examine the values in the columns and find the index of the value with the smallest column (Lines 15 and 16).

We then sort these values using our existing rows  (Lines 17-19).

Let’s print the results and analyze them:

The final step is to combine them using zip  (Lines 20). The resulting list is printed on Line 21.

Analyzing the results, we find that:

  • D[1, 2]  has the smallest Euclidean distance implying that the second existing object will be matched against the third input centroid.
  • And D[0, 1]  has the next smallest Euclidean distance which implies that the first existing object will be matched against the second input centroid.

I’d like to reiterate here that now that you’ve reviewed the code, you should go back and review the steps to the algorithm in the previous section. From there you’ll be able to associate the code with the more linear steps outlined here.

Implementing the object tracking driver script

Now that we have implemented our CentroidTracker  class, let’s put it to work with an object tracking driver script.

The driver script is where you can use your own preferred object detector, provided that it produces a set of bounding boxes. This could be a Haar Cascade, HOG + Linear SVM, YOLO, SSD, Faster R-CNN, etc. For this example script, I’m making use of OpenCV’s deep learning face detector, but feel free to make your own version of the script which implements a different detector.

Inside this script, we will:

  • Work with a live VideoStream  object to grab frames from your webcam
  • Load and utilize OpenCV’s deep learning face detector
  • Instantiate our CentroidTracker  and use it to track face objects in the video stream
  • And display our results which includes bounding boxes and object ID annotations overlaid on the frames

When you’re ready, open up object_tracker.py  from today’s “Downloads” and follow along:

First, we specify our imports. Most notably we’re using the CentroidTracker  class that we just reviewed. We’re also going to use VideoStream  from imutils  and OpenCV.

We have three command line arguments which are all related to our deep learning face detector:

  • --prototxt : The path to the Caffe “deploy” prototxt.
  • --model : The path to the pre-trained model models.
  • --confidence : Our probability threshold to filter weak detections. I found that a default value of 0.5  is sufficient.

The prototxt and model files come from OpenCV’s repository and I’m including them in the “Downloads” for your convenience.

Note: In case you missed it at the start of this section, I’ll repeat that you can use any detector you wish. As an example, we’re using a deep learning face detector which produces bounding boxes. Feel free to experiment with other detectors, just be sure that you have capable hardware to keep up with the more complex ones (some may run best with a GPU, but today’s face detector can easily run on a CPU).

Next, let’s perform our initializations:

In the above block, we:

  • Instantiate our CentroidTracker , ct  (Line 21). Recall from the explanation in the previous section that this object has three methods: (1) register , (2) deregister , and (3) update . We’re only going to use the update  method as it will register and deregister objects automatically. We also initialize H  and W  (our frame dimensions) to None  (Line 22).
  • Load our serialized deep learning face detector model from disk using OpenCV’s DNN module (Line 26).
  • Start our VideoStream , vs  (Line 30). With vs  handy, we’ll be able to capture frames from our camera in our next while  loop. We’ll allow our camera 2.0  seconds to warm up (Line 31).

Now let’s begin our while  loop and start tracking face objects:

We loop over frames and resize  them to a fixed width (while preserving aspect ratio) on Lines 34-47. Our frame dimensions are grabbed as needed (Lines 40 and 41).

Then we pass the frame through the CNN object detector to obtain predictions and object locations (Lines 46-49).

We initialize a list of rects , our bounding box rectangles on Line 50.

From there, let’s process the detections:

We loop over the detections beginning on Line 53. If the detection exceeds our confidence threshold, indicating a valid detection, we:

  • Compute the bounding box coordinates and append them to the rects  list (Lines 59 and 60)
  • Draw a bounding box around the object (Lines 64-66)

Finally, let’s call update  on our centroid tracker object, ct :

The ct.update  call on Line 70 handles the heavy lifting in our simple object tracker with Python and OpenCV script.

We would be done here and ready to loop back to the top if we didn’t care about visualization.

But that’s no fun!

On Lines 73-79 we display the centroid as a filled in circle and the unique object ID number text. Now we’ll be able to visualize the results and check to see if our CentroidTracker  properly keeps track of our objects by associating the correct IDs with the objects in the video stream.

We’ll display the frame on Line 82 until the quit key (“q”) has been pressed (Lines 83-87). If the quit key is pressed, we simply  break  and perform cleanup (Lines 87-91).

Centroid object tracking results

To see our centroid tracker in action using the “Downloads” section of this blog post to download the source code and OpenCV face detector. From there, open up a terminal and execute the following command:

Below you can see an example of a single face (my face) being detected and tracked:

This second example includes two objects being correctly detected and tracked:

Notice how I even though the second face is “lost” once I move the book cover outside the view of the camera, our object tracking is able to pick the face back up again when it comes into view. If the face had existed outside the field of view for more than 50 frames, the object would have been deregistered.

The final example animation here demonstrates tracking three unique objects:

Again, despite object ID #2 being unsuccessfully detected between some frames, our object tracking algorithm is able to find it again and associate it with its original centroid.

For a more detailed demonstration of our object tracker, including commentary, be sure to refer to the video below:

Limitations and drawbacks

While our centroid tracker worked great in this example, there are two primary drawbacks of this object tracking algorithm.

The first is that it requires that object detection step to be run on every frame of the input video.

  • For very fast object detectors (i.e., color thresholding and Haar cascades) having to run the detector on every input frame is likely not an issue.
  • But if you are (1) using a significantly more computationally expensive object detector such as HOG + Linear SVM or deep learning-based detectors on (2) a resource-constrained device, your frame processing pipeline will slow down tremendously as you will be spending the entire pipeline running a very slow detector.

The second drawback is related to the underlying assumptions of the centroid tracking algorithm itself — centroids must lie close together between subsequent frames.

  • This assumption typically holds, but keep in mind we are representing our 3D world with 2D frames — what happens when an object overlaps with another one?
  • The answer is that object ID switching could occur.
  • If two or more objects overlap each other to the point where their centroids intersect and instead have the minimum distance to the other respective object, the algorithm may (unknowingly) swap the object ID.
  • It’s important to understand that the overlapping/occluded object problem is not specific to centroid tracking — it happens for many other object trackers as well, including advanced ones.
  • However, the problem is more pronounced with centroid tracking as we relying strictly on the Euclidean distances between centroids and no additional metrics, heuristics, or learned patterns.

As long as you keep these assumptions and limitations in mind when using centroid tracking the algorithm will work wonderfully for you.

Summary

In today’s blog post you learned how to perform simple object tracking with OpenCV using an algorithm called centroid tracking.

The centroid tracking algorithm works by:

  1. Accepting bounding box coordinates for each object in every frame (presumably by some object detector).
  2. Computing the Euclidean distance between the centroids of the input bounding boxes and the centroids of existing objects that we already have examined.
  3. Updating the tracked object centroids to their new centroid locations based on the new centroid with the smallest Euclidean distance.
  4. And if necessary, marking objects as either “disappeared” or deregistering them completely.

Our centroid tracker performed well in this example tutorial but has two primary downsides:

  1. It requires that we run an object detector for each frame of the video — if your object detector is computationally expensive to run you would not want to utilize this method.
  2. It does not handle overlapping objects well and due to the nature of the Euclidean distance between centroids, it’s actually possible for our centroids to “swap IDs” which is far from ideal.

Despite its downsides, centroid tracking can be used in quite a few object tracking applications provided (1) your environment is somewhat controlled and you don’t have to worry about potentially overlapping objects and (2) your object detector itself can be run in real-time.

If you enjoyed today’s blog post, be sure to download the code using the form below. I’ll be back next week with another object tracking tutorial!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , ,

64 Responses to Simple object tracking with OpenCV

  1. fdkssdks July 23, 2018 at 11:18 am #

    Thanks, Adrian for this post. I was looking forward to it

    I am going to implement it soon and see how it goes.

    Also, opencv released this tracker just for the info, you may already know about it.

    GOTURN : Deep Learning based Object Tracking

    • Adrian Rosebrock July 23, 2018 at 2:21 pm #

      I saw! It’s a purely deep learning-based object detector. I’m sure I will utilize it in a future post and demo how to use it 🙂

      • smit August 21, 2018 at 6:10 am #

        Any update on the demo by using GOTURN @Adrian

        • Adrian Rosebrock August 21, 2018 at 6:43 am #

          I haven’t had any time to play around with GOTURN yet. When I do I’ll be writing a dedicated blog post.

  2. Zubair Ahmed July 23, 2018 at 12:04 pm #

    Excell…excellent post!

    I read totally understood it in the first go for reasons I have shared in my email 🙂

    There is a little typo ‘First, we find the minimum value for each row, allowing [is]->[us] to figure out which existing object is closest to the new input centroid’

    Looking forward to the whole series

    • Adrian Rosebrock July 23, 2018 at 2:23 pm #

      Thanks Zubair! Typo fixed 🙂

  3. Yassine Benhajali July 23, 2018 at 1:36 pm #

    Hi Adrian, big fan of your blog, you do an awesome job.

    I guess it’s also possible to use your code for other than faces tracking (let say dogs or cat).
    do you know to get that working.
    Thanks,

    • Adrian Rosebrock July 23, 2018 at 2:22 pm #

      You would swap out the face detector and utilize your own object detector. This post will help you get started.

      Otherwise, you will want to wait until next week where I’ll have a specific example of using a different object detector (such as a dog or cat one).

  4. Jay Abrams July 23, 2018 at 1:38 pm #

    thank you for your time and effort Adrian!

    • Adrian Rosebrock July 23, 2018 at 2:20 pm #

      Thanks Jay!

  5. satinder singh July 23, 2018 at 2:56 pm #

    Thank you for your blogs adrian, I enjoy reading them. Looking forward for the next blogs. Thank you again.

    • Adrian Rosebrock July 23, 2018 at 4:22 pm #

      Thank you Satinder 🙂

  6. Osmin July 23, 2018 at 9:42 pm #

    Thank you very much…

    I am getting a lot of help!!

    Can I use your code in my APP ??

    • Adrian Rosebrock July 24, 2018 at 8:27 am #

      Yes, you can use the code in your app. I would appreciate a credit or link back to the PyImageSearch blog though 🙂

  7. zhao July 23, 2018 at 10:40 pm #

    Thanks, Adrian for this post. I want to use your blog to track fish. How do you train fish detectors?
    Thank you again.

  8. Stalon July 23, 2018 at 11:43 pm #

    Adrian, thank you very much for the sharing.
    I really learned a lot from this post.

    I have a question about the occlusion solution.
    What is the mechanism of solving occlusion problem while tracking?
    Is that similar as Kalman Filter or something?

    • Adrian Rosebrock July 24, 2018 at 8:26 am #

      There is no true “solution” to occlusion. There are methods and heuristics that we can apply (based on the project specifications) that can help with occlusion but there is no one true “solution” to occlusion.

  9. sohail July 24, 2018 at 3:39 am #

    Great Work.

    • Adrian Rosebrock July 24, 2018 at 8:26 am #

      Thanks Sohail 🙂

  10. ki2rin July 24, 2018 at 3:40 am #

    Thank you for another great tutorial, Adrian.

    I have only used Haar + Adaboost in the past(around 2011 or 2012) for face detection.
    What I remember is 1) it was super fast, but 2) it accepts a lot of false positives.
    So, in my natural thought, the vanilla implementation of Adaboost can hardly be used for the tracking system, am I right?
    Otherwise, the centroid tracking seems to be much promising for this purpose.
    One thing I wonder is that how robust is the centroid tracking when the object moves fast?

    • Adrian Rosebrock July 24, 2018 at 8:29 am #

      Haar cascades are face detectors, they are not object trackers. You use Haar cascades to compute the bounding box of a face, then the resulting centroid, and then finally pass that info to your object tracker.

      That said, you are correct that if your detector is not accurate than you cannot expect the tracker to be accurate as well. The detector must be working accurately — that is why I choose to use a more accurate deep learning-based object detector in this post.

  11. David July 24, 2018 at 5:53 am #

    Hi Adrian!
    I may have find an error in your code. If I initialize the CentroidTracker() class with a lower maxDisappeared (10 for example) I end up with this error in line 40 of centroidtracker.py : “RuntimeError: OrderedDict mutated during iteration”.
    I solved it by replacing line 40 with the following code: “for objectID in list(self.disappeared.keys()):”

    I am looking forward for your blog post in this series, keep going you rock!

    David

    • Adrian Rosebrock July 24, 2018 at 8:30 am #

      Thanks for sharing, David!

  12. Paul Zikopoulos July 24, 2018 at 7:12 am #

    No biggie, slight typo in comment of code line 86 of 1st script …

    # with the smallest value as at the *front* of the index

    as > is

    GREAT ARTICLE … between this and my amazing course I have a full time learner’s job

    • Adrian Rosebrock July 24, 2018 at 8:30 am #

      Thanks Paul 🙂

  13. Stefan Karos July 24, 2018 at 10:28 am #

    Thank you Adrian for your clarity and devotion to this subject. Without saying the words, you even inspire novices like me to think about use cases.Would this approach be appropriate to use to count people walking in a street? For example I live on a ‘dead end’ (cul de sac for the masses). I would like to count the people coming and going into this ‘gated community’. Walking from left to right would be entering. Walking from right to left would be exiting. My camera can be positioned from 5 to 30 feet above the street looking down on its entire width (20 feet). So I suspect I would need to track object motion and and assign a vector value to each centroid. (Positive values for entry and negative values for exit). Summing the signs would give me the total counts. Looking at vector magnitude would also give me their speed (how much sauntering, walking, running, etc)

    Such a project (especially if implementable on a raspberry Pi) has enormous utility. You could even make and sell a ‘people counter’ ! Knowing how many people enter and leave a building, a museum (or museum wing), a graduation a theme park adds significant security in addition to providing useful information to improve facilities.

    Forgive me if this is not the right place to post this but I did not find your forums on this site. But perhaps that’s intentional-when would you find time to moderate forums?!

    • Adrian Rosebrock July 24, 2018 at 10:45 am #

      Hey Stefan — I’ll actually be covering “people counting” later in this series of blog posts. Stay tuned as I believe the implementation I’ll be sharing will help you solve your project. Outside of the actual blog posts go, you should take a look at the PyImageSearch Gurus course which includes a dedicated, private community forums.

    • Yash July 27, 2018 at 12:49 am #

      Please have a look at optical flow (Dense) and you will find a better and robust solution of your problem.
      Though if using moving/shaky camera to record then don’t forgot to stabilize the video using template matching before applying optical flow as flow could give an wrong information in case of relative motion due to the motion of the camera.
      Trust me result would be mind blowing.

  14. Prajesh Sanghvi July 25, 2018 at 2:35 am #

    Hey Adrian, thank you so much for all this info. I am a beginner with python, and i keep on getting the error: object_tracker.py: error: argument -p/–prototxt is required

    even after assigning the path in the “help” part of the argument section
    i.e:

    ap = argparse.ArgumentParser()
    ap.add_argument(“-p”, “–prototxt”, required=True,
    help=”/home/pi/CamProject/FaceTrack/simple-object-tracking/deploy.prototxt”)

    can you please help me out?
    also i cant figure out which of the files is for the MODEL section.

    Thank you

  15. Farooq July 25, 2018 at 10:53 am #

    Thanks for the awesome post Adrian.
    I think you have a typo somewhere in step two;
    “… Euclidean distances between each pair of original centroids (purple) and new centroids (purple)…” I think it should be new centroid(yellow)

    • Adrian Rosebrock July 27, 2018 at 5:35 am #

      You are correct, thank you for reporting this typo — it’s now been fixed.

  16. Navaneeth Krishnan July 25, 2018 at 11:28 am #

    adrian i am a big fan of yours. can you make a post on “openface” a library for face recognition. i have seen your face recognition post and i have tried but some times the face name was wrongly mentioned. i we combine face recognition and tracking then the occlusion problem will be solved

    • Adrian Rosebrock July 27, 2018 at 5:33 am #

      Hey Navaneeth, I will consider doing a blog post dedicated to OpenFace but I cannot guarantee if/when that may be. As far as your project goes, if the face is being incorrectly identified you should spend more time playing with the parameters to the “face_recognition” library, in particular the distance parameters, to help reduce incorrect classifications. You may also want to look into fine-tuning the network as well.

  17. Prajesh Sanghvi July 25, 2018 at 11:29 am #

    Thank you so much!!

  18. Ian Carr-de Avelon July 26, 2018 at 4:53 am #

    What about using the hardware to reduce the CPU processing?
    If you can only do a full search and identify in a fraction of the frames, the faces or objects could be tracked if you have MPEG encoding of all the video. MPEG only sends a fraction of the frames and then vectors saying how bits should be moved around to recreate the following frames. Calculating those vectors is highly efficiently coded or even available in hardware. Eg. you can pay extra for a code number to uncriple the hardware encoders in the PI.
    This is an idea I’ve had for a while and occasionally searched for useful libraries. I’ve just come across this:
    http://picamera.readthedocs.io/en/release-1.12/recipes2.html#recording-motion-vector-data
    which gives Python code to get the vectors from a PIcam.
    Yours
    Ian

    • Adrian Rosebrock July 27, 2018 at 5:32 am #

      You could certainly develop heuristics and assumptions as well to help reduce computational burden. But keep in mind that those assumptions may only be relevant to a specific project and may not be able to be used across all projects.

      Secondly, deep learning-based object detectors included an “objectness” component that quickly determines which areas of the image warrant additional computation, including applying the object detector. They are still slower (but more accurate) than your traditional Haar cascades but they yield higher accuracy as well. As hardware progresses in the future these models will run faster.

  19. Cristian Benglenok July 26, 2018 at 10:11 am #

    very good tutorial, we have implemented an object counter using deep learning to identify the object and using OpenCV an algorithm to perform the count. It’s pretty slow, we’ll try this method to improve the speed.

    Thanks

    • Adrian Rosebrock July 27, 2018 at 5:30 am #

      Hey Christian — congrats on implementing a successful project! As far as deep learning goes, have you tried pushing the inference to the GPU? That would speedup the pipeline.

  20. amir taherkhani July 27, 2018 at 1:22 am #

    hi adrian i am iranian user on your website i can not buy your books 🙁 but your free tutorials is very usefull for me Thank you very much for sharing …

    • Adrian Rosebrock July 27, 2018 at 5:28 am #

      I am happy that you are enjoying the free blog posts Amir. It’s wonderful to have you as part are of the PyImageSearch family 🙂 Keep learning from them!

  21. Rob July 27, 2018 at 8:19 am #

    HI Adrian, your blog is totally awesome!

    Prior to stumbling upon your site, using an RPI and a windows pc I was able to create a robotic camera which streams wirelessy (almost no latency) and is able to be aimed via servos and the arrow controls on the laptop. Moving forward, I want to add object tracking as it applies to human entities. I recently received this blog and I thought that it would be a good jump off point for the upgrades. WIth that said, I am curious as to whether you have blogged about implementing the face detector (or a general human detector) and where it can be found?

    Thus far, the RPI is a streaming slave and the majority of the heavy lifting is done on the laptop. On that note, the computational resources are limited… so I am really hoping there is an efficient algorithm/implementation that already exists.

    I would appreciate any advice you can provide.

    Thank you.

  22. Sai Teja July 28, 2018 at 5:05 pm #

    Hi Adrian,

    Thanks for the amazing post. You are awesome man. I am eagerly waiting and would request you to do a tutorial on object tracking using advanced methods as you mentioned in the post that you will be doing(kernel-based and correlation-based tracking algorithms)

    • Adrian Rosebrock July 31, 2018 at 9:57 am #

      You can find the first of the more advanced methods in this post. Stay tuned for more object tracking posts coming soon!

  23. Zayyana July 29, 2018 at 9:37 am #

    So much thanks mr. I usually use
    “…
    for(x,y,w,h) in object:
    (x_centre,y_centre)=(x+w/2,y+h/2)
    …”
    because i confused when use numpy to get the centre. computing (x,y,w,h) much easier to understand

  24. Z Abrams July 31, 2018 at 10:50 am #

    Hi,
    As usual, a great post (though the simplicity of your next post from July 30th really blew me away – once I got all the Python/Opencv Versions to work [updated to 3.4.2 + updated contrib package]
    However, one thing I couldn’t figure out was what the indices of the “detections” are: e.g. detections[0, 0, i, 3:7]. I get that i is the iterator, and 3:7 apparently give the bounding box coordinates, but what data type is this, and what other things are there?
    I couldn’t figure it out online – not even sure what data structure I’m looking at (net? forward?). Unfortunately the OpenCV documentation can sometimes be confusing [it’s actually usually a bit easier for C++].

    Also, on another note, I actually do everything in Windows (10), and usually, it’s even easier to do than on Linux (Ubuntu). Windows packages are usually pretty easy to install – no Cmake or whatnot, though there ARE some tricks you must learn. So perhaps you can update your “how to install python/opencv/etc” tutorials. [The one exception is dlib – which is a pain, but it works via my Anaconda version of python 3.6 and its built in virtual environment system]. [Also, this statement is true regarding your Deep Learning book, where I’m using a GPU as well, and got it up and running very quickly – faster than Ubuntu]

    • Adrian Rosebrock July 31, 2018 at 10:57 am #

      1. The “detections” variable is a bit hard to fully understand. I don’t have a fully-documented example on it (I’ll consider it for the future) but for the time being be sure to refer to my source code on examples of how to extract the bounding box coordinates, labels, and corresponding probabilities.

      2. I don’t officially support Windows here on the PyImageSearch blog. You are more than welcome to use whatever OS you feel more comfortable with. In general, I’ve found that most readers struggle significantly more with Windows than Ubuntu or macOS for computer vision or deep learning.

  25. alibigdeli August 7, 2018 at 2:50 am #

    i simply love your passion and mind blowing ideas
    from old times i remember there was a windows application called flutter and it was a simple detection of hand poses to play or stop music or going left or right. that idea was bugging me for so long cause i was a kid but now i am aware of what was happening back there.
    my struggle is to write a code to detect and track hand in a frame video which is from my web cam and then use it as a mouse. like moving around and clicking.
    the problem is i know the whole story like i have been coding it with a haar and kalman filter to follow it and simply move the mouse too but clicking and other stuffs are the hard part.
    after seeing this post it inspires me with other ideas again but still need time to learn. can you point me in right direction or even a post for this matter cause i know lots of people are after this project to simply use it for ai products but note commercial simply at home

    • Adrian Rosebrock August 7, 2018 at 6:29 am #

      Just so I understand correctly — you are able to track the hand via Haar cascades and Kalman filters but you need a suggestion for the GUI interaction library? I don’t have a ton of experience in that area but I really like pyautogui.

      • alibigdeli August 7, 2018 at 5:54 pm #

        no actually its far deeper
        i am using pynput for that purpose
        the problem is tracking hand with the pose hand like simply making a fist to click and following the fist to move the mouse too
        there are several ways to do it but my code was to get the resolution of the screen and adapt the point i am showing ,to follow like a point in the middle of the rectangle as (x,y) of the web camera feed and get the simulated point on another resolution, so that i could be covering the wider screens dynamically.
        1- the main problem is double hands detection (which i am working on it but not successful yet)
        2- another problem is false detections rate is high cause of haarcascade usage
        lots of these had been used with ir cameras like kinect depth and so on…
        but i cannot afford them all i have is my webcam and normal knowledge of image proccessing
        3- another problem is when hand is gone the pointer still follows the dot prediction which is annoying for me (as a matter of fact seeing your post gave me an idea of reseting point after not detection of hand in specific time)
        4- covering edges of the screen is almost not possible i know i have to come up with adding extra numbers to resolution to cover it
        5- i have no idea if this is the right path i am going or not
        like using haar should be changed with something else or even kalman filter should be combined with others too (this is the real question!)

  26. Sanjay August 10, 2018 at 4:53 am #

    Hello Adrian,

    Thank you for this awesome project.I wanted to ask you like there are moments when people usually are engaged in mobile and hence there head is tilted down most of the time.I have used same face detector which you have used.The problem is that with little head tilted straight down it is not able to detect the face.Can you please suggest me how to overcome this problem?

    P.S:- Have been following you since long like for me you are god of CV.Thank you.

    • Adrian Rosebrock August 10, 2018 at 6:04 am #

      Hey Sanjay, thank you for the kind words, I appreciate it. As far as the head titling, I assume by the “same model I’m using” you are referring to the deep learning face detector? If so, the model should be able to handle slight head tilts, but if the head is tilted too much it won’t work. You may want to consider training a face detector on head tilted images.

      Secondly, the dlib library includes another deep learning-based face detector. You may want to give it a try and see if it gives you better detection accuracy. An example of how to use it can be found in this post.

  27. Dhruvin .p. patel August 13, 2018 at 2:12 am #

    Hey How to Implement on Keras base tensorflow lib.

    • Adrian Rosebrock August 15, 2018 at 8:54 am #

      Hm, you don’t need either Keras or TensorFlow for this example. What specific functionality do you want Keras/TensorFlow for?

  28. Sanjay August 13, 2018 at 2:26 am #

    Hello Adrian,

    Currently I am running tracking on CPU using opencv dnn face detector.It misses slightly bounding box in some frame.My Question is If i shift to GPU does it make detection in every frame better? Also Is there any relation with distance the object is standing from the camera? Also does result changes if the object is standing on left or right side of the frame?

    • Adrian Rosebrock August 15, 2018 at 8:54 am #

      Running inference on the GPU will make the predictions faster, but not more accurate. Right or left side of the frame won’t matter but typically objects closer to the camera are easier to detect.

  29. Gery August 13, 2018 at 11:00 am #

    Nice article, enjoyed reading it as usual. I used the centroid tracking algorithm with YOLO face detector and it works fine. Thank you Adrian.

    • Adrian Rosebrock August 15, 2018 at 8:48 am #

      Awesome, I’m glad to hear it Gery! 🙂

  30. Sanjay August 16, 2018 at 3:07 am #

    Hello Adrian,

    Can we define detection distance?For example it detects people at particular distance only or object that are more closer to camera?

    • Adrian Rosebrock August 17, 2018 at 7:28 am #

      Absolutely, but you would need the modify the code to perform that action. A general rule is that smaller objects are farther away and larger objects are closer to the camera. Combined with this code you can determine in which direction they are moving as well.

  31. absolom August 16, 2018 at 4:33 am #

    thanks, good explanation

Leave a Reply