OpenCV People Counter

Click here to download the source code to this post.

In this tutorial you will learn how to build a “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time.

Building a person counter with OpenCV has been one of the most-requested topics here on the PyImageSearch and I’ve been meaning to do a blog post on people counting for a year now — I’m incredibly thrilled to be publishing it and sharing it with you today.

Enjoy the tutorial and let me know what you think in the comments section at the bottom of the post!

To get started building a people counter with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV People Counter with Python

In the first part of today’s blog post, we’ll be discussing the required Python packages you’ll need to build our people counter.

From there I’ll provide a brief discussion on the difference between object detection and object tracking, along with how we can leverage both to create a more accurate people counter.

Afterwards, we’ll review the directory structure for the project and then implement the entire person counting project.

Finally, we’ll examine the results of applying people counting with OpenCV to actual videos.

Required Python libraries for people counting

In order to build our people counting applications, we’ll need a number of different Python libraries, including:

Additionally, you’ll also want to use the “Downloads” section of this blog post to download my source code which includes:

  1. My special pyimagesearch  module which we’ll implement and use later in this post
  2. The Python driver script used to start the people counter
  3. All example videos used here in the post

I’m going to assume you already have NumPy, OpenCV, and dlib installed on your system.

If you don’t have OpenCV installed, you’ll want to head to my OpenCV install page and follow the relevant tutorial for your particular operating system.

If you need to install dlib, you can use this guide.

Finally, you can install/upgrade your imutils via the following command:

Understanding object detection vs. object tracking

There is a fundamental difference between object detection and object tracking that you must understand before we proceed with the rest of this tutorial.

When we apply object detection we are determining where in an image/frame an object is. An object detector is also typically more computationally expensive, and therefore slower, than an object tracking algorithm. Examples of object detection algorithms include Haar cascades, HOG + Linear SVM, and deep learning-based object detectors such as Faster R-CNNs, YOLO, and Single Shot Detectors (SSDs).

An object tracker, on the other hand, will accept the input (x, y)-coordinates of where an object is in an image and will:

  1. Assign a unique ID to that particular object
  2. Track the object as it moves around a video stream, predicting the new object location in the next frame based on various attributes of the frame (gradient, optical flow, etc.)

Examples of object tracking algorithms include MedianFlow, MOSSE, GOTURN, kernalized correlation filters, and discriminative correlation filters, to name a few.

If you’re interested in learning more about the object tracking algorithms built into OpenCV, be sure to refer to this blog post.

Combining both object detection and object tracking

Highly accurate object trackers will combine the concept of object detection and object tracking into a single algorithm, typically divided into two phases:

  • Phase 1 — Detecting: During the detection phase we are running our computationally more expensive object tracker to (1) detect if new objects have entered our view, and (2) see if we can find objects that were “lost” during the tracking phase. For each detected object we create or update an object tracker with the new bounding box coordinates. Since our object detector is more computationally expensive we only run this phase once every N frames.
  • Phase 2 — Tracking: When we are not in the “detecting” phase we are in the “tracking” phase. For each of our detected objects, we create an object tracker to track the object as it moves around the frame. Our object tracker should be faster and more efficient than the object detector. We’ll continue tracking until we’ve reached the N-th frame and then re-run our object detector. The entire process then repeats.

The benefit of this hybrid approach is that we can apply highly accurate object detection methods without as much of the computational burden. We will be implementing such a tracking system to build our people counter.

Project structure

Let’s review the project structure for today’s blog post. Once you’ve grabbed the code from the “Downloads” section, you can inspect the directory structure with the tree  command:

Zeroing in on the most-important two directories, we have:

  1. pyimagesearch/ : This module contains the centroid tracking algorithm. The centroid tracking algorithm is covered in the “Combining object tracking algorithms” section, but the code is not. For a review of the centroid tracking code ( ) you should refer to the first post in the series.
  2. mobilenet_ssd/ : Contains the Caffe deep learning model files. We’ll be using a MobileNet Single Shot Detector (SSD) which is covered at the top of this blog post in the section, “Single Shot Detectors for object detection”.

The heart of today’s project is contained within the  script — that’s where we’ll spend most of our time. We’ll also review the  script today.

Combining object tracking algorithms

Figure 1: An animation demonstrating the steps in the centroid tracking algorithm.

To implement our people counter we’ll be using both OpenCV and dlib. We’ll use OpenCV for standard computer vision/image processing functions, along with the deep learning object detector for people counting.

We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.

I’ll be including a deep dive into dlib’s object tracking algorithm in next week’s post.

Along with dlib’s object tracking implementation, we’ll also be using my implementation of centroid tracking from a few weeks ago. Reviewing the entire centroid tracking algorithm is outside the scope of this blog post, but I’ve included a brief overview below.

At Step #1 we accept a set of bounding boxes and compute their corresponding centroids (i.e., the center of the bounding boxes):

Figure 2: To build a simple object tracking via centroids script with Python, the first step is to accept bounding box coordinates and use them to compute centroids.

The bounding boxes themselves can be provided by either:

  1. An object detector (such as HOG + Linear SVM, Faster R- CNN, SSDs, etc.)
  2. Or an object tracker (such as correlation filters)

In the above image you can see that we have two objects to track in this initial iteration of the algorithm.

During Step #2 we compute the Euclidean distance between any new centroids (yellow) and existing centroids (purple):

Figure 3: Three objects are present in this image. We need to compute the Euclidean distance between each pair of original centroids (red) and new centroids (green).

The centroid tracking algorithm makes the assumption that pairs of centroids with minimum Euclidean distance between them must be the same object ID.

In the example image above we have two existing centroids (purple) and three new centroids (yellow), implying that a new object has been detected (since there is one more new centroid vs. old centroid).

The arrows then represent computing the Euclidean distances between all purple centroids and all yellow centroids.

Once we have the Euclidean distances we attempt to associate object IDs in Step #3:

Figure 4: Our simple centroid object tracking method has associated objects with minimized object distances. What do we do about the object in the bottom-left though?

In Figure 4 you can see that our centroid tracker has chosen to associate centroids that minimize their respective Euclidean distances.

But what about the point in the bottom-left?

It didn’t get associated with anything — what do we do?

To answer that question we need to perform Step #4, registering new objects:

Figure 5: In our object tracking example, we have a new object that wasn’t matched with an existing object, so it is registered as object ID #3.

Registering simply means that we are adding the new object to our list of tracked objects by:

  1. Assigning it a new object ID
  2. Storing the centroid of the bounding box coordinates for the new object

In the event that an object has been lost or has left the field of view, we can simply deregister the object (Step #5).

Exactly how you handle when an object is “lost” or is “no longer visible” really depends on your exact application, but for our people counter, we will deregister people IDs when they cannot be matched to any existing person objects for 40 consecutive frames.

Again, this is only a brief overview of the centroid tracking algorithm.

Note: For a more detailed review, including an explanation of the source code used to implement centroid tracking, be sure to refer to this post.

Creating a “trackable object”

In order to track and count an object in a video stream, we need an easy way to store information regarding the object itself, including:

  • It’s object ID
  • It’s previous centroids (so we can easily to compute the direction the object is moving)
  • Whether or not the object has already been counted

To accomplish all of these goals we can define an instance of TrackableObject  — open up the  file and insert the following code:

The TrackableObject  constructor accepts an objectID  + centroid  and stores them. The centroids variable is a list because it will contain an object’s centroid location history.

The constructor also initializes counted  as False , indicating that the object has not been counted yet.

Implementing our people counter with OpenCV + Python

With all of our supporting Python helper tools and classes in place, we are now ready to built our OpenCV people counter.

Open up your  file and insert the following code:

We begin by importing our necessary packages:

  • From the pyimagesearch  module, we import our custom CentroidTracker  and TrackableObject  classes.
  • The  VideoStream  and FPS  modules from  will help us to work with a webcam and to calculate the estimated Frames Per Second (FPS) throughput rate.
  • We need imutils  for its OpenCV convenience functions.
  • The dlib  library will be used for its correlation tracker implementation.
  • OpenCV will be used for deep neural network inference, opening video files, writing video files, and displaying output frames to our screen.

Now that all of the tools are at our fingertips, let’s parse command line arguments:

We have six command line arguments which allow us to pass information to our people counter script from the terminal at runtime:

  • --prototxt : Path to the Caffe “deploy” prototxt file.
  • --model : The path to the Caffe pre-trained CNN model.
  • --input : Optional input video file path. If no path is specified, your webcam will be utilized.
  • --output : Optional output video path. If no path is specified, a video will not be recorded.
  • --confidence : With a default value of 0.4 , this is the minimum probability threshold which helps to filter out weak detections.
  • --skip-frames : The number of frames to skip before running our DNN detector again on the tracked object. Remember, object detection is computationally expensive, but it does help our tracker to reassess objects in the frame. By default we skip 30  frames between detecting objects with the OpenCV DNN module and our CNN single shot detector model.

Now that our script can dynamically handle command line arguments at runtime, let’s prepare our SSD:

First, we’ll initialize CLASSES  — the list of classes that our SSD supports. This list should not be changed if you’re using the model provided in the “Downloads”We’re only interested in the “person” class, but you could count other moving objects as well (however, if your “pottedplant”, “sofa”, or “tvmonitor” grows legs and starts moving, you should probably run out of your house screaming rather than worrying about counting them! 😋 ).

On Line 38 we load our pre-trained MobileNet SSD used to detect objects (but again, we’re just interested in detecting and tracking people, not any other class). To learn more about MobileNet and SSDs, please refer to my previous blog post.

From there we can initialize our video stream:

First we handle the case where we’re using a webcam video stream (Lines 41-44). Otherwise, we’ll be capturing frames from a video file (Lines 47-49).

We still have a handful of initializations to perform before we begin looping over frames:

The remaining initializations include:

  • writer : Our video writer. We’ll instantiate this object later if we are writing to video.
  • W  and H : Our frame dimensions. We’ll need to plug these into cv2.VideoWriter .
  • ct : Our CentroidTracker . For details on the implementation of CentroidTracker , be sure to refer to my blog post from a few weeks ago.
  • trackers : A list to store the dlib correlation trackers. To learn about dlib correlation tracking stay tuned for next week’s post.
  • trackableObjects : A dictionary which maps an objectID  to a TrackableObject .
  • totalFrames : The total number of frames processed.
  • totalDown  and totalUp : The total number of objects/people that have moved either down or up. These variables measure the actual “people counting” results of the script.
  • fps : Our frames per second estimator for benchmarking.

Note: If you get lost in the while  loop below, you should refer back to this bulleted listing of important variables.

Now that all of our initializations are taken care of, let’s loop over incoming frames:

We begin looping on Line 76. At the top of the loop we grab the next frame  (Lines 79 and 80). In the event that we’ve reached the end of the video, we’ll break  out of the loop (Lines 84 and 85).

Preprocessing the frame  takes place on Lines 90 and 91. This includes resizing and swapping color channels as dlib requires an rgb  image.

We grab the dimensions of the frame  for the video writer  (Lines 94 and 95).

From there we’ll instantiate the video writer  if an output path was provided via command line argument (Lines 99-102). To learn more about writing video to disk, be sure to refer to this post.

Now let’s detect people using the SSD:

We initialize a status  as “Waiting” on Line 107. Possible status  states include:

  • Waiting: In this state, we’re waiting on people to be detected and tracked.
  • Detecting: We’re actively in the process of detecting people using the MobileNet SSD.
  • Tracking: People are being tracked in the frame and we’re counting the totalUp  and totalDown .

Our rects  list will be populated either via detection or tracking. We go ahead and initialize rects  on Line 108.

It’s important to understand that deep learning object detectors are very computationally expensive, especially if you are running them on your CPU.

To avoid running our object detector on every frame, and to speed up our tracking pipeline, we’ll be skipping every N frames (set by command line argument --skip-frames  where 30  is the default). Only every N frames will we exercise our SSD for object detection. Otherwise, we’ll simply be tracking moving objects in-between.

Using the modulo operator on Line 112 we ensure that we’ll only execute the code in the if-statement every N frames.

Assuming we’ve landed on a multiple of skip_frames , we’ll update the status  to “Detecting” (Line 114).

Then we initialize our new list of trackers  (Line 115).

Next, we’ll perform inference via object detection. We begin by creating a blob  from the image, followed by passing the blob  through the net to obtain detections  (Lines 119-121).

Now we’ll loop over each of the detections  in hopes of finding objects belonging to the “person” class:

Looping over detections  on Line 124, we proceed to grab the confidence  (Line 127) and filter out weak results + those that don’t belong to the “person” class (Lines 131-138).

Now we can compute a bounding box for each person and begin correlation tracking:

Computing our bounding box  takes place on Lines 142 and 143.

Then we instantiate our dlib correlation tracker  on Line 148, followed by passing in the object’s bounding box coordinates to dlib.rectangle , storing the result as rect  (Line 149).

Subsequently, we start tracking on Line 150 and append the tracker  to the trackers  list on Line 154.

That’s a wrap for all operations we do every N skip-frames!

Let’s take care of the typical operations where tracking is taking place in the else  block:

Most of the time, we aren’t landing on a skip-frame multiple. During this time, we’ll utilize our trackers  to track our object rather than applying detection.

We begin looping over the available trackers  on Line 160.

We proceed to update the status  to “Tracking” (Line 163) and grab the object position (Lines 166 and 167).

From there we extract the position coordinates (Lines 170-173) followed by populating the information in our rects  list.

Now let’s draw a horizontal visualization line (that people must cross in order to be tracked) and use the centroid tracker to update our object centroids:

On Line 181 we draw the horizontal line which we’ll be using to visualize people “crossing” — once people cross this line we’ll increment our respective counters

Then on Line 185, we utilize our CentroidTracker  instantiation to accept the list of rects , regardless of whether they were generated via object detection or object tracking. Our centroid tracker will associate object IDs with object locations.

In this next block, we’ll review the logic which counts if a person has moved up or down through the frame:

We begin by looping over the updated bounding box coordinates of the object IDs (Line 188).

On Line 191 we attempt to fetch a TrackableObject  for the current objectID .

If the TrackableObject  doesn’t exist for the objectID , we create one (Lines 194 and 195).

Otherwise, there is already an existing TrackableObject , so we need to figure out if the object (person) is moving up or down.

To do so, we grab the y-coordinate value for all previous centroid locations for the given object (Line 204). Then we compute the direction  by taking the difference between the current centroid location and the mean of all previous centroid locations (Line 205).

The reason we take the mean is to ensure our direction tracking is more stable. If we stored just the previous centroid location for the person we leave ourselves open to the possibility of false direction counting. Keep in mind that object detection and object tracking algorithms are not “magic” — sometimes they will predict bounding boxes that may be slightly off what you may expect; therefore, by taking the mean, we can make our people counter more accurate.

If the TrackableObject  has not been counted  (Line 209), we need to determine if it’s ready to be counted yet (Lines 213-222), by:

  1. Checking if the direction  is negative (indicating the object is moving Up) AND the centroid is Above the centerline. In this case we increment totalUp .
  2. Or checking if the direction  is positive (indicating the object is moving Down) AND the centroid is Below the centerline. If this is true, we increment totalDown .

Finally, we store the TrackableObject  in our trackableObjects  dictionary (Line 225) so we can grab and update it when the next frame is captured.

We’re on the home-stretch!

The next three code blocks handle:

  1. Display (drawing and writing text to the frame)
  2. Writing frames to a video file on disk (if the --output  command line argument is present)
  3. Capturing keypresses
  4. Cleanup

First we’ll draw some information on the frame for visualization:

Here we overlay the following data on the frame:

  • ObjectID : Each object’s numerical identifier.
  • centroid  : The center of the object will be represented by a “dot” which is created by filling in a circle.
  • info  : Includes totalUp , totalDown , and status

For a review of drawing operations, be sure to refer to this blog post.

Then we’ll write the frame  to a video file (if necessary) and handle keypresses:

In this block we:

  • Write the  frame , if necessary, to the output video file (Lines 249 and 250)
  • Display the frame  and handle keypresses (Lines 253-258). If “q” is pressed, we break  out of the frame processing loop.
  • Update our fps  counter (Line 263)

We didn’t make too much of a mess, but now it’s time to clean up:

To finish out the script, we display the FPS info to the terminal, release all pointers, and close any open windows.

Just 283 lines of code later, we are now done 😎.

People counting results

To see our OpenCV people counter in action, make sure you use the “Downloads” section of this blog post to download the source code and example videos.

From there, open up a terminal and execute the following command:

Here you can see that our person counter is counting the number of people who:

  1. Are entering the department store (down)
  2. And the number of people who are leaving (up)

At the end of the first video you’ll see there have been 7 people who entered and 3 people who have left.

Furthermore, examining the terminal output you’ll see that our person counter is capable of running in real-time, obtaining 34 FPS throughout.  This is despite the fact that we are using a deep learning object detector for more accurate person detections.

Our 34 FPS throughout rate is made possible through our two-phase process of:

  1. Detecting people once every 30 frames
  2. And then applying a faster, more efficient object tracking algorithm in all frames in between.

Another example of people counting with OpenCV can be seen below:

I’ve included a short GIF below to give you an idea of how the algorithm works:

Figure 7: An example of an OpenCV people counter in action.

A full video of the demo can be seen below:

This time there have been 2 people who have entered the department store and 14 people who have left.

You can see how useful this system would be to a store owner interested in foot traffic analytics.

The same type of system for counting foot traffic with OpenCV can be used to count automobile traffic with OpenCV and I hope to cover that topic in a future blog post.

Additionally, a big thank you to David McDuffee for recording the example videos used here today! David works here with me at PyImageSearch and if you’ve ever emailed PyImageSearch before, you have very likely interacted with him. Thank you for making this post possible, David! Also a thank you to BenSound for providing the music for the video demos included in this post.

What are the next steps?

Congratulations on building your person counter with OpenCV!

If you’re interested in learning more about OpenCV, including building other real-world applications, including face detection, object recognition, and more, I would suggest reading through my book, Practical Python and OpenCV + Case Studies.

Practical Python and OpenCV is meant to be a gentle introduction to the world of computer vision and image processing. This book is perfect if you:

  • Are new to the world of computer vision and image processing
  • Have some past image processing experience but are new to Python
  • Are looking for some great example projects to get your feet wet


Learn OpenCV fundamentals in a single weekend!

If you’re looking for a more detailed dive into computer vision, I would recommend working through the PyImageSearch Gurus course. The PyImageSearch Gurus course is similar to a college survey course and many students report that they learn more than a typical university class.

Inside you’ll find over 168 lessons, starting with the fundamentals of computer vision, all the way up to more advanced topics, including:

  • Face recognition
  • Automatic license plate recognition
  • Training your own custom object detectors
  • …and much more!

You’ll also find a thriving community of like-minded individuals who are itching to learn about computer vision. Each day in the community forums we discuss:

  • Your burning questions about computer vision
  • New project ideas and resources
  • Kaggle and other competitions
  • Development environment and code issues
  • …among many other topics!

Master computer vision inside PyImageSearch Gurus!


In today’s blog post we learned how to build a people counter using OpenCV and Python.

Our implementation is:

  • Capable of running in real-time on a standard CPU
  • Utilizes deep learning object detectors for improved person detection accuracy
  • Leverages two separate object tracking algorithms, including both centroid tracking and correlation filters for improved tracking accuracy
  • Applies both a “detection” and “tracking” phase, making it capable of (1) detecting new people and (2) picking up people that may have been “lost” during the tracking phase

I hope you enjoyed today’s post on people counting with OpenCV!

To download the code to this blog post (and apply people counting to your own projects), just enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , ,

181 Responses to OpenCV People Counter

  1. Jay August 13, 2018 at 10:53 am #

    Hi Adrian ! the tutorial is really great and it’s very helpful to me . however, I was wandering that is this kind of people counting can implement on raspberry pi3 ?

    • Adrian Rosebrock August 13, 2018 at 10:58 am #

      If you want to use just the Raspberry Pi you need to use a more efficient object detection routine. Possible methods may include:

      1. Background subtraction, such as the method used in this post.
      2. Haar cascades (which are less accurate, but faster than DL-based object detectors)
      3. Leveraging something like the Movidius NCS to help you reach a faster FPS throughput

      Additionally, for your object tracking you may want to look into using MOSSE (covered in this post) which is faster than correlation filters. Another option could be to explore using Kalman filters.

      I hope that helps!

      • 蘇鉉 August 14, 2018 at 1:21 am #

        thank you so much! another question , is it possible to combine this people counting algorithm with the method you have post before which was talk inking about Raspberry Pi: Deep learning object detection with OpenCV

        • Adrian Rosebrock August 15, 2018 at 8:37 am #

          Yes, you can, but keep in mind that the FPS throughput rate is going to be very, very low since you’re trying to apply deep learning object detection on the Pi.

      • Wang August 21, 2018 at 6:15 pm #

        Adrian, to get better performance with raspberry pi3, do you need to use all of these methods? Or just a few? For example, you can join background subtraction with Haar Cascade?

        Thank you very much!

        • Adrian Rosebrock August 22, 2018 at 9:25 am #

          You can join background subtraction in with a Haar cascade and then only apply the Haar cascade to the ROI regions. But realistically Haar cascades are pretty fast anyway so that may be overkill.

      • Lafleur August 22, 2018 at 3:02 am #

        Thank you so much for your work and for sharing it. It’s great.
        May you detail a bit more what we are suppose to do to use the software on Raspberry. I’m not very used to it so I don’t understant everything you wrote.

        • Adrian Rosebrock August 22, 2018 at 9:22 am #

          I’ll likely have to write another blog post on Raspberry Pi people counting — it’s too much to detail in a blog post comment.

          • Lafleur August 22, 2018 at 10:02 am #

            Seems logic…
            Could you give me the URL of a trusted blog where you use to go on which I will be able to find informations ?
            I’ve tried the software “Footfall” but it doesn’t work.
            And many blogs are just outdated concerning this subject.
            Thank you for all 🙂

          • Adrian Rosebrock August 22, 2018 at 10:17 am #

            I don’t know of one, which is why I would have to do one here on PyImageSearch 😉

          • Ben Bartling August 22, 2018 at 5:00 pm #

            Looking forward to the Rasberry Pi people counting!

          • AJ August 30, 2018 at 12:14 am #

            Hi, firstly, thank you for your blog it’s so awesome! Im wondering when that Raspberry Pi counter will be posted? Also can it be made into vehicles? Thank you!

          • Adrian Rosebrock August 30, 2018 at 8:54 am #

            Yes, you can do the same for vehicles, just swap out the “person” class for any other objects you want to detect and track. I’m honestly not sure when the Pi people counter will be posted. I have a number of other blog posts and projects I’m working on though. I hope it will be soon but I cannot guarantee it.

          • Cheeriocheng September 11, 2018 at 3:40 am #

            yes please!

      • Andres September 12, 2018 at 3:29 pm #

        Hi Adrian. I have a question about Kalman filters. I wanna implement people counter on a Raspberry PI3B and I use background substraction for detection and FindCountours to enclosing in a rectangle the person position and for tracking I need to implement MOSSE o Kalman filter but here is my question. How can I track a person with those algorithms? Because each of those algorithm need to receive the position of the object but I’m detect multiple object so it will be an issue to send the correct coordinate for each object that I need to track

      • clarence chhoa November 11, 2018 at 8:16 pm #

        can this code deals with live streaming?

  2. issaiass August 13, 2018 at 11:08 am #

    Great! Awesome job as always. I was trying to improve my tracking part. This is a good reference point for my application.

    Thankyou Adrian!

  3. Sukhmeet SIngh August 13, 2018 at 11:19 am #

    Hi Adrian,
    This is by far my Favorite blog post from you.
    I was wondering if you could also do a blog/tutorial on people counting in an image and show the gender of the people. That would make up for a really interesting blog and tutorial.

  4. rvjenya August 13, 2018 at 11:36 am #

    I really liked your blog lesson.. Thanks so much. I’m going to convers caffe model to NCS Movidius and go to Store my friend. Hi is going to count people and recognize (age, gender and maybe emotion). I really like your Blog. I plan to buy your book. Thanks for motivation and good practic.

    • Adrian Rosebrock August 13, 2018 at 12:46 pm #

      Thank you for the kind words, I’m happy you liked the post. I wish the best of luck to you and your friend implementing your own person counter for their store!

  5. anirban August 13, 2018 at 12:04 pm #

    Sir ,

    Great Work. Thanks for Sharing.



    • Adrian Rosebrock August 13, 2018 at 12:46 pm #

      Thanks Anirban!

  6. Anand Simmy August 13, 2018 at 12:42 pm #

    Hi Adrian, is there any specifc reason to use dlib correlation tracker instead of opencv’s 8 inbuilt trackers.Will any of those trackers will be more precise than dlib tracker?

    • Adrian Rosebrock August 13, 2018 at 12:46 pm #

      To quote the blog post:

      “We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.”

      OpenCV’s CSRT tracker may be more accurate but it will be slower. Similarly, OpenCV’s MOSSE tracker will be faster but potentially less accurate.

  7. Bilal August 13, 2018 at 2:08 pm #

    Loved your post and with the level of explanation so you have posted hats off to you SIr! I was wandering what if we have to implement it on multiple cameras? or we have separate door/ separate camera for entrance and exit. would like to have your words on these too. Thanks in advance.

    • Adrian Rosebrock August 13, 2018 at 2:20 pm #

      This tutorial assumes you’re using a single camera. If you’re using multiple cameras it becomes more challenging. If the viewpoint changes then your object tracker won’t be able to associate the objects. Instead, you might want to look into face recognition or even gate recognition, enabling you to associate a person with a unique ID based on more than just appearance alone.

      • Bilal August 13, 2018 at 4:55 pm #

        Yes, the view point do change. As cameras will be placed on certain different places. We would like to tag the person with his face id and recognize around all the cameras using the face recognition and ID. Thank you once again.

        • Adrian Rosebrock August 15, 2018 at 8:45 am #

          Yeah, if the viewpoints are changing you’ll certainly want to explore face recognition and gait recognition instead.

  8. Michael Gerasimov August 13, 2018 at 2:12 pm #

    I liked the article very much. in the new centers on all the inputs to put cameras and on the computer to collect information that all the people came out and no one hid in the interior.

  9. Dakhoo August 13, 2018 at 2:36 pm #

    Thanks for sharing this tutorial – last week I was trying to do something similar – do you think you can make a comment/answer on ?!

    • Jaca September 4, 2018 at 9:25 am #

      You can try to “place” a blank region on already detected car. Since the tracking method gives you location of the object in every frame, you could just move the blank region accordingly. Then you can use it to prevent Haar cascade from finding a car there. If you’re worried about overlapping cars, I suggest you adjust the size of blank region.

  10. Krishna August 13, 2018 at 2:52 pm #

    Does this algorithm works fine with raspberry pi based projects ? If not suggest me a effective algorithm for detecting humman presence sir . I have treid cassade method but it does not make the satisfaction .

    Thank you sir , I am awaiting for ur reply

    • Adrian Rosebrock August 15, 2018 at 8:45 am #

      Make sure you’re reading the comments. I’ve already answered your question in my reply to Jay — it’s actually the very first comment on this post.

  11. ando August 13, 2018 at 3:21 pm #

    Thanks. God Job. How to improve the code to detect people very close?

    • Adrian Rosebrock August 15, 2018 at 8:44 am #

      Hey Ando — can you clarify what you mean by “very close”? Do you have any examples?

  12. Jeff August 13, 2018 at 5:56 pm #

    Thank you very much for these tutorials. I am new to this and I seem to be having issues getting webcam video from Can you provide a short test script to open the video stream from the pi camera using imutils?

    • Adrian Rosebrock August 15, 2018 at 8:43 am #

      Just set:

      vs = VideoStream(usePiCamera=True).start()

  13. Rohit August 14, 2018 at 12:25 am #

    Thanks for the wonderful explanation. It was always a pleasure to read your post. I ran your people-counting tracker but getting some random objectID while detection. For me on 2nd example videos there was 20 people going Up and 2 people coming Down. What do you recommend to remove these ambiguities ?

    • Adrian Rosebrock August 15, 2018 at 8:39 am #

      Hey Rohit — that is indeed strange behavior. What version of OpenCV and dlib are you using?

      • kaisar khatak August 18, 2018 at 9:59 pm #

        Running the Downloaded scripts with the default parameter values using the same input videos, I was UNABLE to match the sample output videos. I ran into the same issue as Rohit.

        I played around with the confidence values and still could NOT match the results. The code is missing some detections and what looks like overstating (false positive detections?) others? Any ideas???

        Nvidia TX1
        OpenCV 3.3
        Python 3.5 (virtual environment)

        The videos can be viewed on my google drive:

        Video 1: (My Result = Up 3, Down 8) [Actual (ground truth) Up 3 Down 7]

        Video 2: (My Result = Up 20, Down 2) [Actual (ground truth) Up 14 Down 3]

        • Adrian Rosebrock August 22, 2018 at 10:09 am #

          Upgrade from OpenCV 3.3 to OpenCV 3.4 or better and it will work for you 🙂 (which I also believe you found it from other comments but I wanted to make sure)

      • kaisar khatak August 19, 2018 at 6:05 pm #

        Comment Updated (4/19): I encountered the same issue using OpenCV 3.3, but after I upgraded to OpenCV 3.4.1, my results now match the video on this blog post. I recommend upgrading to OpenCV 3.4 for anyone encountering similar detection/tracking behavior…

    • kaisar khatak August 19, 2018 at 6:04 pm #

      Rohit – I encountered the same issue using OpenCV 3.3, but after I upgraded to OpenCV 3.4.1, my results now match the video on this blog post. I recommend upgrading to OpenCV 3.4…

  14. Sourabh Mane August 14, 2018 at 1:08 am #

    Hi Adrian,

    Thanks for the great post!!!!. I have few questions..

    1.Will this people counter work on crowded places like Airport or Railway station’s?? Will it give accurate count??

    2.Can we use it for mass(crowd) counting?? Does it consider pet’s and babies??

    • Adrian Rosebrock August 15, 2018 at 8:38 am #

      1. Provided you can detect the full body and there isn’t too much occlusion, yes, the method will work. If you cannot detect the full body you might want to switch to face detection instead.

      2. See my note above. You should also read the code to see how we filter out non-people detection 🙂

      • kaisar khatak August 19, 2018 at 10:08 pm #

        I have come across some app developers using what looks to be custom trained head detection models. Sometimes, the back of the head can be seen, other times the frontal view can be seen. I think the “head count” approach makes sense since that is how humans think when taking class attendance for example. Is head counting a better method for people counting??? Is this even possible and will the method be accurate for the back of heads???

        Examples: (VION VISION)

        • Adrian Rosebrock August 22, 2018 at 10:04 am #

          I’m reluctant to say which is “better” as that’s entirely dependent on your application and what exactly you’re trying to do. You could argue that in dense areas a “head detector” would be better than a “body detector” since the full body might not be visible. But on the other hand, having a full body detected can reduce false positives as well. Again, it’s dependent on your application.

  15. Anthony The Koala August 14, 2018 at 3:15 am #

    Dear Dr Adrian,
    I need a clarification please on object detection. How does the object detector distinguish between human and non-human objects.
    Thank you,
    Anthony of exciting Sydney

    • Adrian Rosebrock August 15, 2018 at 8:34 am #

      The object detector has been pre-trained to recognize a variety of classes. Line 137 also filters out non-person detections.

  16. qiang92 August 14, 2018 at 4:23 am #

    Thanks for your sharing.

  17. David August 14, 2018 at 6:38 am #

    Hi Adrian,

    For the detection part, I wanted to try another network. So I went for the ssd_mobilenet_v2_coco_2018_03_29, tensorFlow version (See here: and here: ).

    Problem is I had too much detection boxes, so I used a NMS function to help me sort out things, but even after that I had too much results even with confidence at 0.3 and NMS treshold at 0.2, see an exemple here: (network detection boxes are in red, NMS output boxes are in green)
    Do you know why I have got some much results? Is it because I used a TensorFlow model instead of Caffe? Or is it because the network was trained with other parameters? Something changed in SSD MobileNet v2 compared to chuanqi305’s SSD mobileNet?


    • Adrian Rosebrock August 15, 2018 at 8:25 am #

      Hey David — I haven’t tested the TensorFlow model you are referring to so I’m honestly not sure why it would be throwing so many false positives like that. Try to increase your minimum confidence threshold to see if that helps resolve the issue.

  18. Christian August 14, 2018 at 11:22 am #

    Thanks Adrian, great work!!!

    please can you tell us what version of Python and OpenCV you used ????

    Do you think this code can works with a raspberry PI 3 with streaming from an IP camera?

    • Adrian Rosebrock August 15, 2018 at 8:23 am #

      I used OpenCV 3.4 for this example. As for using the Raspberry Pi, make sure you read my reply to Jay.

  19. Roald August 15, 2018 at 5:40 am #

    Hi Adrian,

    You write “we utilize our CentroidTracker instantiation to accept the list of rects , regardless of whether they were generated via object detection or object tracking” however as far as I can see, in the Object Detection fase, you don’t actually seem to populate the rects[] variable? I’ve downloaded the source as well, couldn’t find it there either.
    Am I missing something?

    Very valuable post throughout, looks a lot like what I am trying to achieve for my cat tracker (which you may recall from earlier correspondence).

    • Adrian Rosebrock August 15, 2018 at 8:15 am #

      Hey Roald — we don’t actually have to populate the list during the object detection phase. We simply create the tracker and then allow the tracker to update “rects” during the tracking phase. Perhaps that point was not clear.

  20. kumar August 15, 2018 at 8:26 am #

    Great article, I have a doubt though, It could potentially be a noob question so please bare with me.
    Say I use this in my shop for tracking foot count, now all the new objects are stored in a dictionary right? If i leave the code running perpetually, wont it cause errors with the memory?

    • Adrian Rosebrock August 15, 2018 at 8:31 am #

      If you left it running perpetually, yes, the dictionary could inflate. It’s up to you to add any “business logic” code to update the dictionary. Some people may want to store that information in a proper database as well — it’s not up to me make those decisions for people. This code is a start point for tracking foot count.

  21. Abkul August 15, 2018 at 9:53 am #

    Great blog!!! its amazing how you simplify difficult concepts.

    I am working on ways to identify each and every individual going through the entrance through image captured in real time using a camera(we have their passport size photos plus other labels e.g., personal identification number, department ,etc).kindly advice on how to include this multi class labels other than the ID notation you used in the example.

    Will you be covering the storage of the counted individuals to the database for later retrieval?

  22. Juan LP August 15, 2018 at 12:01 pm #

    For those who had the following error when running the script:

    Traceback (most recent call last):
    File “”, line 160, in
    rect = dlib.rectangle(startX, startY, endX, endY)
    Boost.Python.ArgumentError: Python argument types in
    rectangle.__init__(rectangle, numpy.int32, numpy.int32, numpy.int32, numpy.int32)
    did not match C++ signature:
    __init__(struct _object * __ptr64, long left, long top, long right, long bottom)
    __init__(struct _object * __ptr64)

    please update line 160 of to

    rect = dlib.rectangle(int(startX), int(startY), int(endX), int(endY))

    • Adrian Rosebrock August 15, 2018 at 1:23 pm #

      Thanks for sharing, Juan! Could you let us know which version of dlib you were using as well just so we have it documented for other readers who may run into the problem?

      • Durian August 23, 2018 at 11:24 pm #

        i have the same problem with him and my version of dlib is 19.6.0

        • lenghonglin September 16, 2018 at 9:52 am #

          my dlib version is 19.8.1

        • Mou October 23, 2018 at 11:32 am #

          i have the same problem with it and i have tried 19.18.0 and 19.6.0, both of them doesn’t work.

    • gunan August 16, 2018 at 2:06 am #


    • Aysenur September 13, 2018 at 4:55 am #

      thanks 🙂

    • lenghonglin September 16, 2018 at 9:48 am #

      Hi,i meet the same question,do u solve it?

      • Adrian Rosebrock September 17, 2018 at 2:17 pm #

        As Juan said, you change Line 160 to:

        rect = dlib.rectangle(int(startX), int(startY), int(endX), int(endY))

  23. Kibeom Kwon August 15, 2018 at 9:15 pm #


    Your wonderful work is priceless text book. Unfortunately, my understanding is still not enough to understand the whole code. I tried to execute python files, but have an error.

    Can I know how to solve it. Thank you so much

    python –prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \
    usage: [-h] -p PROTOTXT -m MODEL [-i INPUT] [-o OUTPUT]
    [-c CONFIDENCE] [-s SKIP_FRAMES] error: argument -m/–model is required

    • Adrian Rosebrock August 16, 2018 at 5:32 am #

      Your error can be solved by properly providing the command line arguments to the script. If you’re new to command line arguments, that’s fine, but you should read up on them first.

    • m October 9, 2018 at 6:49 am #

      remove ‘/’ between the arguments and remove the newline space and provide the 3 lines as 1 liner command

  24. Jan August 16, 2018 at 12:09 pm #

    Hi Adrian,
    thanks for sharing this great article! It really helps me a lot to understand object tracking.

    The CentroidTracker uses two parameters: MaxDisappeared and MaxDistance.
    I understand the reason for MaxDistance, but I cannot find the implementation in the source code.

    I am running this algorithm on vehicle detection in traffic and the same ID is sometimes jumping between different objects.
    How can I implement MaxDistance to avoid that?

    Thanks in advance! I really appreciate your work!!

    • Adrian Rosebrock August 16, 2018 at 3:54 pm #

      Hey Jan — have you used the “Downloads” section of the blog post to download the source code? If so, take a look at the implementation. You will find both variables being used inside the file.

    • Misbah September 18, 2018 at 8:18 am #

      Kindly help me to, Have you resolve the error.

  25. Mattia August 16, 2018 at 12:56 pm #

    Hi Adrian,
    do you think it’s worth to train a deep learning object detector with only the classes I’m interested in (about 15), instead of filtering classes on a pre-trained model, to run it on devices with limited resources(beagleBoard X-15 or similar SBC)?

    • Adrian Rosebrock August 16, 2018 at 3:54 pm #

      If you train on just the classes you are interested in you may be able to achieve higher accuracy, but keep in mind it’s not going to necessarily improve your inference time that much.

  26. David August 17, 2018 at 11:49 am #

    Hi Adrian,

    Does this implement the multi-processing you were talking about the week before in ?

    • Adrian Rosebrock August 17, 2018 at 12:41 pm #

      It doesn’t use OpenCV’s implementation of multi-object tracking, but it uses my implementation of how to turn dlib’s object trackers into multi-object trackers.

  27. sau August 19, 2018 at 1:36 am #

    thank you very much dear adrian for best blog post

  28. senay August 20, 2018 at 8:28 am #

    This is really nice thank you….
    I have developed a people counter using Dlib tracker and SSD detector. you have skipped 30 frames for the detector to save memory usage. but in my case the detection and the tracker run in each of the frames. when there is no detection (when the detector lost the object) I try to initialize the tracker by the previous bounding box of the tracker ( only for two frames). the problem is when there is no object in the video ( object is not lost by the detector but has passed ) the tracker bounding box stack on the screen and it cause a problem when another object came in the view of the video. is there any way to delete the tracker when I need?

    • Adrian Rosebrock August 22, 2018 at 9:54 am #

      I would suggest applying another layer of tracking, this time via centroid tracking like I do in this guide. If the maximum distance between the old object and new object is greater than N pixels, delete the tracker.

  29. Aditya Johar August 21, 2018 at 3:54 am #

    Hi Adrian
    Again, a great tutorial. Can’t praise it enough. I’ve got my current job because of PyImageSearch and that’s what this site means to me.
    I was going through the code, and trying to understand –

    –If you are running the object detector every 30 frames, how are you ensuring that an *already detected* person with an associated objectID, does not get re-detected in the next iteration of the object detector after the 30 frame wait-time? For example, if we have a person walking really slowly, or if two people are having a conversation within the bounds of our input frame, how are they not getting re-detected?–

    Thanks and Regards,

    • Adrian Rosebrock August 22, 2018 at 9:38 am #

      They actually are getting re-detected but our centroid tracker is able to determine if (1) they are the same object or (2) two brand new objects.

  30. Stefan August 21, 2018 at 9:08 am #

    Thank you Adrian for another translation of the language of the gods. The combination of graph theory, mathematics, conversion to code and implementation is like ancient Greek and you are the demigod who takes the time to explain it to us mere mortals. Most importantly, you take a stepwise approach. When ‘Rosebrock Media Group’ has more employees, someone in it can even spend more time showing how alternative code snippets behave. In terms of performance, I am just starting to figure out if a CUDA implementation would be of benefit. Of course, there is no ‘Adrian for CUDA coding’. Getting this to run smoothly on a small box would be another interesting project but requires broad knowledge of all the hardware options available – a Xilinx FPGA? an Edison board? a miniiTX pc? a hacked cell phone? (there’s an idea – it’s a camera, a quad core cpu and a gpu in a tidy package but obviously would need a mounting solution and a power source too). Of course to run on an iphone I have to jailbreak the phone and translate the code to swift. But then perhaps it would be better to go to android as the hardware selection is broader and the OS is ‘open’. Do you frequent any specific message boards where someone might pick up this project and get it to work on a cell phone? There are a lot of performance optimizations that could make it work.

    • Adrian Rosebrock August 22, 2018 at 9:34 am #

      Thank you for the kind words, Stefan! Your comment really made my day 🙂 To answer your question — yes, running the object detector on the GPU would dramatically improve performance. While my deep learning books cover that the OpenCV bindings themselves do not (yet). I’m also admittedly not much of an embedded device user (outside of the Raspberry Pi) so I wouldn’t be able to comment on the other hardware. Thanks again!

      • Mike Isted October 7, 2018 at 3:35 am #

        Hi Adrian, just spotted this…
        For information I have successfully implemented this post on a Jetson TX2, replacing the SSD with one that is optimised for TensorRT. I would refer your reader to the blog of JK Jung for guidance.

        Performance wise, I am finding that all 6 cores are maxed out at 100% and the GPUs at around 50% depending on the balance of SSD/trackers used. The trackers in particular are very CPU intensive and as you say, the pipieline slows a great deal with multiple objects.

        As always, thanks for your huge contribution to the community and congratulations on just getting married!

        Chers, Mike

        • Adrian Rosebrock October 8, 2018 at 9:38 am #

          Awesome, thanks so much for sharing, Mike!

  31. senay August 22, 2018 at 1:14 pm #

    I find out the problem for my issue !! it is because I changed the skip_frames to 15 .
    so how to set an appropriate number of frames to skip? because maximum frame number to skip will lead to a miss to an object and smaller number of skip_frames will lead to inappropriate assignation of object ID….

    • Adrian Rosebrock August 24, 2018 at 8:56 am #

      As you noticed, it’s a balance. You need to balance (1) skipping frames to help the pipeline to run faster while (2) ensuring objects are not lost or trackings missed. You’ll need to experiment to find the right value for skip frames.

  32. Jaime August 23, 2018 at 6:49 am #

    Hi Adrian,

    I’ve recently found your blog and I really like the way you explain things.

    I’m doing and people counter in a raspberry pi , I’m using background subtration and centroid tracking.
    The problem I’m facing is that sometimes objects ID switch as you said in the “simple object tracking with OpenCV” post. Is there something I can do to minimize these errors?

    If you have any recommendations feel free to share.

    Thanks in advance.

    Ps: I’d be really interested if you did a post about people counter in raspberry pi like you mentioned in the first comment

    • Adrian Rosebrock August 24, 2018 at 8:41 am #

      Hey Jaime — there isn’t a ton you can do about that besides reduce the maximum distance threshold.

  33. Nilesh Garg August 23, 2018 at 11:15 am #

    Thanks Adrian for such a nice tutorial. You have released it on perfect timing, I am working on similar kind of project for tracking the number of people in and out from bus. Some how I am not getting proper result. But this tutorial is very good start and helped me to understand the logic.
    Thanks again. Keep rocking!!!

    • Adrian Rosebrock August 24, 2018 at 8:37 am #

      Best of luck with the project, Nilesh!

  34. Wang August 23, 2018 at 10:02 pm #

    Hi Adrian,

    The camera is fixed to how many meters of the floor (approximately)?

    Thank you very much!

    • Adrian Rosebrock August 24, 2018 at 8:34 am #

      To be totally honest I’m not sure how many meters above the ground the camera was. I don’t recall.

  35. Nik August 24, 2018 at 12:36 am #

    Thank you Adrian for inspiring me and introducing me to the world of computer vision.

    I started with your 1st edition and followed quite a few of your blog projects, with great success.
    I was excited to read this blog, as people counting is something I have wanted to pursue.

    However,………………..there’s a problem.

    .When I execute the runtime, I get,

    [INFO] loading model…
    [INFO] opening video file…

    the sample video does open up, plays for about 1 second (The lady doesn’t reach the line), and then, boom…my computer crashes! and Python quits!
    I have tried to increase the –skip-frames, still crashes. I even played with Python3 (thinking my version 2.7 was old) – no joy!

    Is it time to say goodbye to my 11 year old Macbook Pro? or could this be something else?

    “It’s important to understand that deep learning object detectors are very computationally expensive, especially if you are running them on your CPU.”

    Out of interest is there a ballpark guide to minimum spec machines, when delving into this world of OpenCV?

    Best Regards,

    • Nik August 24, 2018 at 12:55 am #


      Reading your /install-dlib-easy-complete-guide/

      I noticed you say to install XCode.
      I had removed XCode for my homebrew installation as instructed, as it was an old version.

      When I installed dlib, I simply did pip install dlib.

      Could this be related?


      • Adrian Rosebrock August 24, 2018 at 8:33 am #

        Hey Nik — it sounds like you’re using a very old system and if you’ve installed/uninstalled Xcode before then that could very well be an issue. I would advise you to try to use a newer system if at all possible. Otherwise, it would be any number of problems and it’s far too challenging to diagnose at this point.

  36. Safaa Diab August 26, 2018 at 4:06 pm #

    Hello, Dr. Adrian thank you for your great work. I am a beginner in this field and your webpage is really helping me through. I have a question, I’ve tried to run this code and an error popped out “ error: the following arguments are required: -p/–prototxt, -m/–model” and I really don’t know what to do. I would be grateful if you helped.
    Thanks in advance.

  37. senay August 27, 2018 at 10:10 am #

    Hi Adrian !!
    This is the answer you give me my question !!! thank you for that….
    August 24, 2018 at 8:56 am

    As you noticed, it’s a balance. You need to balance (1) skipping frames to help the pipeline to run faster while (2) ensuring objects are not lost or tracking missed. You’ll need to experiment to find the right value for skip frames.

    but balancing will be possible for a video because i have it in my hand….
    what do you suggest me for a camera ( do not know when an object will appear to set a skip frame number)

  38. Anand Simmy August 31, 2018 at 12:14 pm #

    Hi Adrian !!,

    How we can evaluate the counting accuracy of this counter ? My mentor asked me for the counting accuracy. Do we need to find some videos as benchmark or is there some libraries for accuracy evaluation ?

  39. Andy September 1, 2018 at 2:47 am #

    Another great post! Thanks so much for your contributions to the community.

    One question, I have tried the code provided on a few test videos and it seems like detected people can be counted as moving up or down without having actually crossed the yellow reference line. In the text you mention the fact that people are only counted once they have crossed the line. Is this a behaviour you have seen as well? Is there an approach you would recommend to place a more strict condition that only counts people who have actually crossed from one side of the line to the other? Thanks

    • Adrian Rosebrock September 5, 2018 at 9:16 am #

      Hey Andy — that’s actually a feature, not a bug. For example, say you are tracking someone moving from the bottom to the top of a frame. But, they are not actually detected until they have crossed the actual line. In that instance we still want to track them so we check if they are above the line, and if so, increment the respective counter. If that’s not the behavior you expect/want you will have to update the code.

  40. Frank Yan September 3, 2018 at 11:24 am #

    Hello Adrian,

    Thank you for the great post.

    I modified the code for horizontal camera as below:

    I noticed that below problems:
    1-No response on fast moving object
    2-Irrelevant Centroids noise
    3-Repeated counting on same person

    And I try to solve these problems by introducing face recognition and pose estimation.

    Do u have any suggestion/comment on this?


    • Adrian Rosebrock September 5, 2018 at 8:56 am #

      Face recognition would greatly solve this problem but the issue you may have is being unable to identify faces from side/profile views. Pose estimation and gait recognition are actually more accurate than face recognition — they would be worth looking into.

  41. Andres Gomez September 4, 2018 at 11:29 am #

    Hi Adrian. First, I wanna said thank you for your time to explains each details on your code, Your blog is incredible (the best of the best!).

    I have a doubt on CentroidTracker, because it creates a object ID when appears a new person on a video but never destroy that ID, so would be cause any trouble in the future with the memory if I wanna implemented on a Raspberry Pi 3? I followed your person counter code just with a some modifications to run it on the PI

    My best regards

    • Adrian Rosebrock September 5, 2018 at 8:35 am #

      Hey Andres — the CentroidTracker actually does destroy the object ID once it has disappeared from a sufficient number of frames.

  42. Andres Gomez September 6, 2018 at 8:51 am #

    Thank you very much Adrian. Another question, I have an problem with centroid tracker update, since a person is out of the frame but instantaneously another person comes in, the algorithm thinks that is the same person, doesn’t count it and put he centroid to the person that came in (I change the maxDisappered but not succes) so I check again the code to understand in which line you use the minimum Euclidean distance to put the new position of the old centroid but I couldn’t understand the method that you used to achieve that. Can you give an advice to solve that problem?

    It doesn’t happen every time but to rise the success rate.

    My best regards

    • Adrian Rosebrock September 11, 2018 at 8:44 am #

      That is an edge case you will need to decide how to handle. If you reduce the “maxDisappared” value too much you could easily register false-positives. Keep in mind that footfall applications are meant to be approximations, they are never 100% accurate, even with a human doing the counting. If it doesn’t happen very often then I wouldn’t worry about it. You will never get 100% accurate footfall counts.

      • Andres September 11, 2018 at 10:08 am #

        I handled modifying the CentroidTracker, where I put a condition if a distance from the old centroid to the new one is more than 200 in y-axis, continue. Thanks for the answer

  43. Marc September 6, 2018 at 9:09 am #

    Somehow i cant run the code….
    I always get the error message:

    Can’t open “mobilenet_ssd/MobilenetSSD_deploy.prototxt” in function ‘ReadProtoFromTextFile’

    Seems like the program is unable to read the prototxt…

    Do you have an idea on how to fix it?

    • Adrian Rosebrock September 11, 2018 at 8:42 am #

      Yes, that does seem to be the problem. Make sure you’ve used the “Downloads” section of the blog post to download the source code + models. From there double-check your paths to the input .prototxt file.

  44. Harsha Jagadish September 10, 2018 at 7:20 am #

    Hi Adrian,

    Thank you for a great tutorial. Would it be possible for you to let me know how I can count the people moving from right to left or left to right. I am able to draw the trigger lines but unable to count the objects.

    Harsha J

    • Adrian Rosebrock September 11, 2018 at 8:14 am #

      You’ll need to modify Lines 213 and 220 (the “if” statements) to perform the check based on the width, not the height. You’ll also want to update Line 204 to keep track of the x-coordinates rather than the y-coordinates.

  45. Jaime September 11, 2018 at 9:25 am #

    Hi Adrian,

    I’m wondering what does the tracker do when a object doesn’t move (i.e. the object stands in the same position for a few frames). I’m not sure if OpenCV’s trackers are able to handle this situation.

    Thanks in advance.

    • Adrian Rosebrock September 11, 2018 at 9:44 am #

      It will keep tracking the object. If the object is lost the object detector will pick it back up.

  46. Toufik September 13, 2018 at 11:22 am #

    Hello Adrian, first i want to say thank you for this amazing project it helped me understand quiet a bunch of thing concerning computer visioning.firstly, i have this question which you could help me with, i want to make this project to monitor two doors on my store and i was wondering what changes i might have to do to use two cameras simultaneously
    ps: i was working on simple opencv programs since that i’m quiet the noob and i tried to use cap0 = cv2.VideoCapture(0)
    cap1 = cv2.VideoCapture(1) however it opens only one camera feed even though the camera indexes are correct!
    Thanks for this project again and for taking time to read my comment

    • Adrian Rosebrock September 14, 2018 at 9:31 am #

      Follow this guide and you’ll be able to efficiently access both your webcams 🙂

  47. smit September 14, 2018 at 5:35 am #

    Hi @Adrian. How can we improve object detection accuracy? As your method is completely based on how good the detection is? Any other model you recommend to use for detection?

    • Adrian Rosebrock September 14, 2018 at 9:20 am #

      That’s a complex question. Exactly how you improve object detection accuracy varies on your dataset, your number of images, and the intended use of the model. I would suggest you read this tutorial on the fundamentals of object detection and then read Deep Learning for Computer Vision with Python to help you get up to speed.

      • smit September 24, 2018 at 7:08 am #

        One of the purpose of object tracking is to track people when object detection may fail right? But your tracking algorithm accuracy i if I understand correctly is completely based on whether we detect object in subsequent frames. What is my object just gets detected once, then how should I track him. What modification will be required in your solution.

        • Adrian Rosebrock October 8, 2018 at 12:53 pm #

          No, the objects do not need to be detected in subsequent frames — I only apply the object detector every N frames, the object tracker tracks the object in between detections. You only need to detect the object once to perform the tracking.

  48. Misbah September 15, 2018 at 11:54 am #

    Hey Adrian, I just downloaded the source code from “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time:

    But getting the following error…

    usage: [-h] -p PROTOTXT -m MODEL [-i INPUT] [-o OUTPUT]
    [-c CONFIDENCE] [-s SKIP_FRAMES] error: the following arguments are required: -p/–prototxt, -m/–model

  49. Bharath September 18, 2018 at 7:05 am #

    Hello Adrian, I’ve been following your blog for a couple of months now and indeed there is no other blog which serves with this much of content and practices. Thanks a lot man.

    Currently, I’m working on a project with the same application “Counting people”. I’m using a raspberry pi and a pi cam. Due to some constraints I’ve settled down to a over-head view of the camera. I’m using computationally less expensive practices. A haar-casacade detector (custom trained to detect head from over-head view). The detector is doing a good job. I have also integrated the tracking and counting methods which you have provided. Firstly I encountered low fps. So, I ventured around a bit and came up with the “imutils” library to spped up my fps feed. Now I have achieved a pretty decent fps throughput. And I aslo have tested the codes with a video feed. Its all working good.
    When I use my live feed from the pi cam. There is a bit of lag at detection and the whole system. How do I get this working at real-time? Is there a way to do this on real-time?
    Or Is this just the computational potential of a raspberry pi.
    Thanks in advance Adrian!
    Curious and eagerly waiting for your reply!

    • Adrian Rosebrock September 18, 2018 at 7:08 am #

      Hi Bharath — thank you for the kind words, I appreciate it 🙂 And congratulations on getting your Pi People Counter this far along, wonderful job! I’d be curious to know where you found a dataset of overhead views of people? I’d like to play around with such a dataset if you don’t mind sharing.

      As far as the lag goes, could you clarify a bit more? Where exactly is this “lag”? If you can be a bit more specific I an try to help but my guess is that it’s a limitation of the Pi itself.

      • Bharath September 19, 2018 at 3:35 am #

        Thanks for the reply Adrian!
        The dataset was hand-labeld at my University. Let me know if you may need it!

        Hey, and by “lag” I mean…

        With a pre-captured video feed, the pi was able to achieve about ~160 fps (15s video)

        With the live feed from pi-cam, it was able to achieve about ~ 50fps(while there was no detection) and once there is detection, the fps reduces down to around 20 fps. (This all was possible only after the implementation of the “imutils” library).

        When tested without the “imutils” library, the fps was around 2fps to 6fps.

        So, what I would like to conclude as the key inference is, The system performs at a pretty good accuracy when the subject(head) travels at a slower speed(Slower than the normal pace at which any human can walk).

        When the head moves at a normal pace(the pace at which anyone normally walks), the system fails to track, even after detection and IDing.

        Hope, I made myself clear Adrian!
        Please let me know your thought about this!

        • Adrian Rosebrock October 8, 2018 at 1:32 pm #

          Hey Bharath, would you mind sending me the dataset? I’d love to play with it!

          Also, thanks for the clarification on your question. I think you should read my reply to Jay at the top of this post. You won’t be able to get true real-time performance out of this method using the Pi, it’s just not fast enough.

  50. lenghonglin September 18, 2018 at 9:37 am #

    Hi Adrian,

    Thank you for a great tutorial. i have some questions。
    1、Where is the caffe model from? how can i train my own model?
    2、Do u test the situation that people hold up an umbrella。My test results is the models can’t detect this situation 。
    Do u have some idea?
    Thanks very much

  51. Jan September 20, 2018 at 3:47 am #

    Hi Adrian

    Can you please make a tutorial with Kalman filter on top of this 🙂

    DLIB is not very good with fast moving objects.

    Thank you.

  52. Rohit sharma September 21, 2018 at 5:58 am #

    Hey ardrian,
    If I want to capture using pi camera what should I do and what will be the command for it?

    • Adrian Rosebrock October 8, 2018 at 1:09 pm #

      I would suggest reading this tutorial to help you get started.

  53. lenghonglin September 22, 2018 at 5:44 am #

    Hi @Adrian. I run this source on Raspberry Pi,but the fps is 3,it’s so slow,slow. Then i change

    Raspberry Pi to RK3399,the solution is not better,FPS almost 20.

    D u have someidea to imporve the FPS?

    Thanks very much.

    • Adrian Rosebrock October 8, 2018 at 1:04 pm #

      Make sure you refer to my reply to Jay at the top of the comments section of this post.

  54. Federico September 30, 2018 at 6:59 pm #

    Hi Adrian, thanks for this great tutorial! I’m using a rpi 3 B+ with raspbian stretch and I am getting very slow frame rates of about 5 fps with the example file. I have tried not writing the output with same results. Playing the example file with omxplayer works fine at 30 fps. I have tried using another sd card to no avail (write speed is about 17 MB/s and read is 22 MB/s, which I think is not that bad). Do you know what could be happening?


  55. Guru Vishnu October 6, 2018 at 11:05 am #

    Hi Adrian,

    Thanks for this post!

    Can you please let me know, the process to use this code to count vehicles?


    • Adrian Rosebrock October 8, 2018 at 9:42 am #

      Hi Guru — I will try to cover vehicle counting in a separate blog post 🙂

      • Guru Vishnu October 8, 2018 at 2:52 pm #

        Thanks Adrian.

        Since I am trying to build one, Can you please enligten me, If I can use time in seconds/milliseconds instead of a centroid to count the object, as the time can be a crucial factor than position … Please let me know your thoughts.


  56. Guru Vishnu October 8, 2018 at 3:01 pm #

    Also, Please let me know, If I can use CAP_PROP_POS_MSEC(via imutils) to count vehicles in live CCTV stream based on time.

  57. Eric N October 10, 2018 at 11:10 pm #

    Hi, I’m trying to swap out the dlib tracker for the OpenCV tracker, since the dlib one is pretty inaccurate. However, when I use the OpenCV, eg, CSRT, the new detections accumulate into a new tracking item, instead of updating and replacing the original ID associated with that object. So in the first cycle, I have one bounding box and tracker with it, and in the next cycle, it will detect a person again, however, it will just create a new tracker and then I’ll have 2 bounding boxes representing the same person. And it keeps adding more trackers each time for the same person. Any idea what I did wrong? Thanks!

    • Adrian Rosebrock October 12, 2018 at 9:09 am #

      It sounds like there is a logic error in your code. My guess is that you’re creating a new CSRT tracker when you should actually be updating the an existing tracker. Keep in mind that we only create the tracker once and from there we only update its position.

  58. Daniele October 11, 2018 at 7:06 am #

    Hi Adrian,
    thank you so much for this post, was very useful for my research project. What’s the best micro-pc (raspberry pi, asus thinker, ecc.) to implement a good counter or a machine learning system in general. Thanks.

    • Adrian Rosebrock October 12, 2018 at 9:04 am #

      That really depends on your end goal. The Pi is a nice piece of hardware and is very cheap but if you want more horsepower to run deep learning models I highly recommend NVIDIA’s Jetson TX series.

  59. Steve October 11, 2018 at 8:18 pm #

    Hi Adrian!

    Thank you for this post. I have quick question that confusing me. In earlier post you mentioned that the size parameter in blob = cv2.dnn.blobFromImage should match CNN network dimensions. According the dim in the prototxt the size should be 300 X 300. W and H being supplied to cv2.dnn.blobFromImage in this example 373, 500. Does this effect accuracy?

    Thank you for your help.


    • Adrian Rosebrock October 12, 2018 at 8:54 am #

      Object detection networks are typically fully convolutional, implying that any size image dimensions can be used. However, for the face detector specifically I’ve found that 300×300 input blobs tend to obtain the best results, at least for the applications I’ve built here on PyImageSearch.

  60. Atul October 15, 2018 at 8:02 am #

    Hi Adrian, this is awesome starting point for people like me who are new to algorithms and implementing machine learning !!! Just curious to ask few silly questions :

    1. Is it possible to track object left to right and vice versa
    2. Is it possible to implement it for live streaming (it appears you have given option, but would like to know more)

    Just thinking of implementing this in one of the maker fairs in Mumbai , if possible. Just to give an idea to students about OS technologies and usages of OpenCV.

    • Adrian Rosebrock October 16, 2018 at 8:30 am #

      1. Yes, see my note in the blog post. You’ll want to update the call to “cv2.line” which actually draws the line but more importantly you’ll want to update the logic in the code that handles direction tracking (Lines 199-222).

      2. Yes, use the VideoStream class.

  61. Hj October 15, 2018 at 1:36 pm #

    Hi Adrian,

    Would it be possible to see if a person exists within a defined space in the video frame ?similar to a rectangle of trigger lines . If yes do let me know how I can go about it .

    Harsha j

    • Adrian Rosebrock October 16, 2018 at 8:27 am #

      Yes, you would want to define the (x, y)-coordinates of your rectangle. If a person is detected within that rectangle you can take whatever action necessary.

      • HJ October 19, 2018 at 12:25 pm #

        Hi Adrian,

        I am able to get the rectangle but recognition is on the entire frame. Could you please let me know how to restrict detection only with in the rectangle ?

        Harsha J

        • Adrian Rosebrock October 20, 2018 at 7:28 am #

          Please take a look at my other replies to your comments, Harsha. I’ve already addressed your question. You can either (1) monitor the (x, y)-coordinates of a given person/objects bounding box and see if they fall into the range of the rectangle or (2) you can simply crop out the ROI of the rectangle and only perform person detection there.

          • HJ October 21, 2018 at 1:00 pm #

            Thanks Adrian,

            Will try it out.

            Harsha Jagadish

  62. Rakesh October 20, 2018 at 7:39 am #

    Hi the code is tracking people. how to make it only count specific kind of objects like a boat in water or car on road etc

    • Adrian Rosebrock October 20, 2018 at 8:10 am #

      You’ll want to take a look at my guide on deep learning-based object detection. Give the guide a read, it will give you a better idea on how to detect and track custom objects.

  63. jorge nunez October 22, 2018 at 9:24 pm #

    I’m working in a project where, rather than tracking movement, i need to know the local coordinates of each person, the images come from multiple cameras and each camera (the algorithm running on the pc, actually) should be able to compute the coordinate of each person detected on the corresponding image, my first thought is to train a neural network, but my intuition tells me that would be overkill, and killing a fly with a bazooka sounds disastrous in any context were you have limited resources, which is my case.

    • Adrian Rosebrock October 29, 2018 at 2:15 pm #

      By “local coordinates” do you mean just the bounding box coordinates? I’m not sure what you mean here.

  64. Dheeraj October 23, 2018 at 5:51 am #

    I really appreciate for a great work from your end.

    I am facing some issue while importing dlib. The code is running without importing dlib but unable to track people and count it. How to install and import dlib.

    Import Error: no module named dlib

    Please figure out this issue for me.

  65. JP October 24, 2018 at 4:47 am #

    hi, I’ve tried to run the however, there this error pop up, ImportError: No module named ‘pyimagesearch’. how do I solve this?

    • Adrian Rosebrock October 29, 2018 at 2:03 pm #

      Make sure you use the “Downloads” section of the blog post to download the source code, videos, and the “pyimagesearch” module. From there you’ll be able to execute the code.

  66. Dheeraj October 24, 2018 at 7:55 am #


    Thank you for a great tutorial.

    The code is only tracking people and even its not counting people when they are moving with greater speed and hence the counting is inaccurate. Even if people comes into the region of interest and moves back without crossing the line , the counter increments. How to avoid false detection and at what minimum height from ground, the webcam should be installed?

    Any solution on how to fix this out?

    • Adrian Rosebrock October 29, 2018 at 2:02 pm #

      You’re getting those results on the videos supplied with the blog post? Or with your own videos?

      • Dheeraj October 31, 2018 at 5:27 am #

        I am getting those results on my own videos as fast movement is not detected and getting counted.I am making use of a normal webcam C270 , the count is not accurate.

  67. Rahma October 26, 2018 at 12:00 pm #

    Hello thank you for the tutorial if you could help me it’s not working with me from the beging and this is the ereur message :
    Traceback (most recent call last):

    ModuleNotFoundError: No module named ‘scipy’

    • Adrian Rosebrock October 29, 2018 at 1:39 pm #

      You need to install SciPy on your system:

      $ pip install imutils

  68. Gordon October 29, 2018 at 2:04 am #

    Hello Adrian, i have tried this with the video of passenger entering the bus and the result is not really good. Is there any way that i can improve the accuracy? Should i fine tune the model with my own data? If so, is there any tutorial or material that i can refer to for fine tuning the model. Thanks a lot.

    • Adrian Rosebrock October 29, 2018 at 1:16 pm #

      Hey Gordon — what specifically do you mean by the result not being good? Are people not being detected? Are the trackers themselves failing?

      • Dheeraj October 31, 2018 at 5:32 am #

        How to increase the accuracy of people count and filter out false detections?

        • Adrian Rosebrock November 2, 2018 at 7:39 am #

          I would suggest training your own object detector on overhead views of people. The object detector we are using here was not trained on such images.

      • Gordon November 1, 2018 at 10:40 pm #

        Hello Adrian, the people is not being detected.

  69. Sohib October 30, 2018 at 5:59 am #

    Above all, I would like to thank you for these efforts you have been doing ever since you first started sharing these awesome posts.

    Now, to the technical part.

    You used MobileNetSSD_deploy.prototxt and MobileNetSSD_deploy.caffemodel as your deep learning “CNN-model” and “weights” respectiveley, right?

    And you only considered “person” class among the classes available.
    Would it be possible to fine-tune the model you used, say, for objects like glasses (that people wear).
    It looks like it is trained in Caffe, so could you share your insights on how to train this model for our custom objects? In that case we would be able to exclude non-trackable objects while fine tuning it. Thanks again!

    • Adrian Rosebrock November 2, 2018 at 8:26 am #

      Yes, you could absolutely fine-tune the model to predict classes it was never trained on. I actually discuss how to fine-tune and train your own custom object detectors inside Deep Learning for Computer Vision with Python.

  70. git-scientist November 9, 2018 at 4:24 am #

    I have tried this awesome code with my own video. The moving is almost similar to the one you had. In my video, however, I encountered some little errors. Let me post them below and if you, Adrian, or some of other guys could help modify particular parts of this code, I’d appreciate it much.

    1. up-counter is incorrectly increased (UP is incremented by 1) as soon as object is detected on the upper half of the horizontal visualization line, and then, if that object moves down, down-counter remains the same (though it should increment by 1).

    2. tracker ID is sometimes lost (on the upper edge of the frame when trackable object is moving a bit horizontally on the upper half of the line) even though object is still in the frame. This causes one object being identified twice.

    Thank you in advance to all who try 😃

    • Adrian Rosebrock November 10, 2018 at 10:01 am #

      1. That is a feature, not a bug. In the event that a person is missed when they are walking up they are allowed to be counted as walking up after (1) they are detected and (2) they demonstrate they are moving up in two consecutive frames. You can modify that behavior, of course.

      2. You may want to try a different object tracker to help increase accuracy. I cover various object trackers in this post.

  71. Henry November 12, 2018 at 1:52 pm #

    First of all, thanks a lot for this post. Very helpful. I can run your code for no problem.

    I am trying to connect the people detected at the people detection step with the ID assigned to it at the tracking step. Any idea on how to do that? A brutal force I can think about is to match the centroid of the people detected in detection step with the centroid of the ids in the tracking step. Any better solution?



  72. Niko Gamulin November 13, 2018 at 11:50 am #

    Thanks for a great post, Adrian!

    I have tried to use people counter on video from actual store. First, I tried to input the frames as they are, without rotation and the model performed really poorly. Then, I tried to rotate the image and it performed a little better:
    Has the model for detecting people from the above been finetuned with the dataset that contains images from that perspective or did you use pretrained model without additional finetuning? I am asking this because I can’t intuitively find an explanation for such a difference in acuracy in case the frames are rotated.
    Also, if you have finetuned the model (this or any other), it would be helpful if you could provide any info about the size of finetuning dataset – I am planning to finetune the model in order to detect occluded people behind the exhibited objects as they obviously affect the prediction accuracy for the model out of the box.

    • Adrian Rosebrock November 13, 2018 at 4:13 pm #

      The model was not fine-tuned at all. It’s an off-the-shelf model trained. Better accuracy would come from training a model on a top-down view of people. As far as tips, suggestions, and best practices when it comes to training/fine-tuning your own models, I would suggest taking a look at Deep Learning for Computer Vision with Python where I’ve included my suggestions to help you successfully train your models.

Leave a Reply