Object detection with deep learning and OpenCV

A couple weeks ago we learned how to classify images using deep learning and OpenCV 3.3’s deep neural network ( dnn ) module.

While this original blog post demonstrated how we can categorize an image into one of ImageNet’s 1,000 separate class labels it could not tell us where an object resides in image.

In order to obtain the bounding box (x, y)-coordinates for an object in a image we need to instead apply object detection.

Object detection can not only tell us what is in an image but also where the object is as well.

In the remainder of today’s blog post we’ll discuss how to apply object detection using deep learning and OpenCV.

Looking for the source code to this post?
Jump right to the downloads section.

Object detection with deep learning and OpenCV

In the first part of today’s post on object detection using deep learning we’ll discuss Single Shot Detectors and MobileNets.

When combined together these methods can be used for super fast, real-time object detection on resource constrained devices (including the Raspberry Pi, smartphones, etc.)

From there we’ll discover how to use OpenCV’s dnn  module to load a pre-trained object detection network.

This will enable us to pass input images through the network and obtain the output bounding box (x, y)-coordinates of each object in the image.

Finally we’ll look at the results of applying the MobileNet Single Shot Detector to example input images.

In a future blog post we’ll extend our script to work with real-time video streams as well.

Single Shot Detectors for object detection

Figure 1: Examples of object detection using Single Shot Detectors (SSD) from Liu et al.

When it comes to deep learning-based object detection there are three primary object detection methods that you’ll likely encounter:

Faster R-CNNs are likely the most “heard of” method for object detection using deep learning; however, the technique can be difficult to understand (especially for beginners in deep learning), hard to implement, and challenging to train.

Furthermore, even with the “faster” implementation R-CNNs (where the “R” stands for “Region Proposal”) the algorithm can be quite slow, on the order of 7 FPS.

If we are looking for pure speed then we tend to use YOLO as this algorithm is much faster, capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get up to 155 FPS.

The problem with YOLO is that it leaves much accuracy to be desired.

SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward (and I would argue better explained in the original seminal paper) than Faster R-CNNs.

We can also enjoy a much faster FPS throughput than Girshick et al. at 22-46 FPS depending on which variant of the network we use. SSDs also tend to be more accurate than YOLO. To learn more about SSDs, please refer to Liu et al.

MobileNets: Efficient (deep) neural networks

Figure 2: (Left) Standard convolutional layer with batch normalization and ReLU. (Right) Depthwise separable convolution with depthwise and pointwise layers followed by batch normalization and ReLU (figure and caption from Liu et al.).

When building object detection networks we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline. The problem is that these network architectures can be very large in the order of 200-500MB.

Network architectures such as these are unsuitable for resource constrained devices due to their sheer size and resulting number of computations.

Instead, we can use MobileNets (Howard et al., 2017), another paper by Google researchers. We call these networks “MobileNets” because they are designed for resource constrained devices such as your smartphone. MobileNets differ from traditional CNNs through the usage of depthwise separable convolution (Figure 2 above).

The general idea behind depthwise separable convolution is to split convolution into two stages:

  1. A 3×3 depthwise convolution.
  2. Followed by a 1×1 pointwise convolution.

This allows us to actually reduce the number of parameters in our network.

The problem is that we sacrifice accuracy — MobileNets are normally not as accurate as their larger big brothers…

…but they are much more resource efficient.

For more details on MobileNets please see Howard et al.

Combining MobileNets and Single Shot Detectors for fast, efficient deep-learning based object detection

If we combine both the MobileNet architecture and the Single Shot Detector (SSD) framework, we arrive at a fast, efficient deep learning-based method to object detection.

The model we’ll be using in this blog post is a Caffe version of the original TensorFlow implementation by Howard et al. and was trained by chuanqi305 (see GitHub).

The MobileNet SSD was first trained on the COCO dataset (Common Objects in Context) and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).

We can therefore detect 20 objects in images (+1 for the background class), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, people, potted plants, sheep, sofas, trains, and tv monitors.

Deep learning-based object detection with OpenCV

In this section we will use the MobileNet SSD + deep neural network ( dnn ) module in OpenCV to build our object detector.

I would suggest using the “Downloads” code at the bottom of this blog post to download the source code + trained network + example images so you can test them on your machine.

Let’s go ahead and get started building our deep learning object detector using OpenCV.

Open up a new file, name it deep_learning_object_detection.py , and insert the following code:

On Lines 2-4 we import packages required for this script — the dnn  module is included in cv2 , again, making hte assumption that you’re using OpenCV 3.3.

Then, we parse our command line arguments (Lines 7-16):

  • --image : The path to the input image.
  • --prototxt : The path to the Caffe prototxt file.
  • --model : The path to the pre-trained model.
  • --confidence : The minimum probability threshold to filter weak detections. The default is 20%.

Again, example files for the first three arguments are included in the “Downloads” section of this blog post. I urge you to start there while also supplying some query images of your own.

Next, let’s initialize class labels and bounding box colors:

Lines 20-23 build a list called CLASSES  containing our labels. This is followed by a list, COLORS  which contains corresponding random colors for bounding boxes (Line 24).

Now we need to load our model:

The above lines are self-explanatory, we simply print a message and load our model  (Lines 27 and 28).

Next, we will load our query image and prepare our blob , which we will feed-forward through the network:

Taking note of the comment in this block, we load our image  (Line 34), extract the height and width (Line 35), and calculate a 300 by 300 pixel blob  from our image (Line 36).

Now we’re ready to do the heavy lifting — we’ll pass this blob through the neural network:

On Lines 41 and 42 we set the input to the network and compute the forward pass for the input, storing the result as detections . Computing the forward pass and associated detections could take awhile depending on your model and input size, but for this example it will be relatively quick on most CPUs.

Let’s loop through our detections  and determine what and where the objects are in the image:

We start by looping over our detections, keeping in mind that multiple objects can be detected in a single image. We also apply a check to the confidence (i.e., probability) associated with each detection. If the confidence is high enough (i.e. above the threshold), then we’ll display the prediction in the terminal as well as draw the prediction on the image with text and a colored bounding box. Let’s break it down line-by-line:

Looping through our detections , first we extract the confidence  value (Line 48).

If the confidence  is above our minimum threshold (Line 52), we extract the class label index (Line 56) and compute the bounding box around the detected object (Line 57).

Then, we extract the (x, y)-coordinates of the box (Line 58) which we will will use shortly for drawing a rectangle and displaying text.

Next, we build a text label  containing the CLASS  name and the confidence  (Line 61).

Using the label, we print it to the terminal (Line 62), followed by drawing a colored rectangle around the object using our previously extracted (x, y)-coordinates (Lines 63 and 64).

In general, we want the label to be displayed above the rectangle, but if there isn’t room, we’ll display it just below the top of the rectangle (Line 65).

Finally, we overlay the colored text onto the image  using the y-value that we just calculated (Lines 66 and 67).

The only remaining step is to display the result:

We display the resulting output image to the screen until a key is pressed (Lines 70 and 71).

OpenCV and deep learning object detection results

To download the code + pre-trained network + example images, be sure to use the “Downloads” section at the bottom of this blog post.

From there, unzip the archive and execute the following command:

Figure 3: Two Toyotas on the highway recognized with near-100% confidence using OpenCV, deep learning, and object detection.

Our first result shows cars recognized and detected with near-100% confidence.

In this example we detect an airplane using deep learning-based object detection:

Figure 4: An airplane successfully detected with high confidence via Python, OpenCV, and deep learning.

The ability for deep learning to detect and localize obscured objects is demonstrated in the following image, where we see a horse (and it’s rider) jumping a fence flanked by two potted plants:

Figure 5: A person riding a horse and two potted plants are successfully identified despite a lot of objects in the image via deep learning-based object detection.

In this example we can see a beer bottle is detected with an impressive 100% confidence:

Figure 6: Deep learning + OpenCV are able to correctly detect a beer bottle in an input image.

Followed by another horse image which also contains a dog, car, and person:

Figure 7: Several objects in this image including a car, dog, horse, and person are all recognized.

Finally, a picture of me and Jemma, the family beagle:

Figure 8: Me and the family beagle are corrected as a “person” and a “dog” via deep learning, object detection, and OpenCV. The TV monitor is not recognized.

Unfortunately the TV monitor isn’t recognized in this image which is likely due to (1) me blocking it and (2) poor contrast around the TV. That being said, we have demonstrated excellent object detection results using OpenCV’s dnn  module.


In today’s blog post we learned how to perform object detection using deep learning and OpenCV.

Specifically, we used both MobileNets + Single Shot Detectors along with OpenCV 3.3’s brand new (totally overhauled) dnn  module to detect objects in images.

As a computer vision and deep learning community we owe a lot to the contributions of Aleksandr Rybnikov, the main contributor to the dnn  module for making deep learning so accessible from within the OpenCV library. You can find Aleksandr’s original OpenCV example script here — I have modified it for the purposes of this blog post.

In a future blog post I’ll be demonstrating how we can modify today’s tutorial to work with real-time video streams, thus enabling us to perform deep learning-based object detection to videos. We’ll be sure to leverage efficient frame I/O to increase the FPS throughout our pipeline as well.

To be notified when future blog posts (such as the real-time object detection tutorial) are published here on PyImageSearch, simply enter your email address in the form below.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , , , ,

121 Responses to Object detection with deep learning and OpenCV

  1. tommy September 11, 2017 at 11:41 am #

    how do we train the dnn using opencv or do we have to use tensorflow and the likes?

    plus where can we get some sample caffemodels?

    tensorflow has some models in its own ckpt format.

    • Adrian Rosebrock September 11, 2017 at 2:31 pm #

      I would start by giving the first post in the series a read. You do not train the models with OpenCV’s dnn module. They are instead trained using tools like Caffe, TensorFlow, or PyTorch. This particular example demonstrates how to load a pre-trained Caffe network.

      The dnn module has been totally re-done in OpenCV 3.3. Many Caffe models will work with it out-of-the-box. I would suggest taking a look at the Caffe Model Zoo for more pre-trained networks.

  2. Max September 11, 2017 at 11:46 am #

    Hi Adrian,
    how long does it take to forward walk through the provided network?
    Is it faster than tensorflow based networks of same architecture?
    Is there a tutorial inside of your books that covers fast recognition and detection using CNN at best in realtime with networks like YOLO.

    • Adrian Rosebrock September 11, 2017 at 2:29 pm #

      1. As I’ll be discussing in next week’s tutorial you’ll be able to get 6-8 frames per second using this method.

      2. Once the model is trained you won’t see massive speed increases as it’s (1) just the forward pass and (2) OpenCV is loading the serialized weights from disk.

      3. Yes, I will be covering object detection inside Deep Learning for Computer Vision with Python. You’ll want to go with the ImageNet Bundle where I discuss SSD and Faster R-CNNs.

  3. Vasanth September 11, 2017 at 1:00 pm #

    Hi Adrian , You always inspired me with your Tremendous Innovation and become my Role Model too….

    Now Coming back to the Topic , I’m Getting this error :

    Traceback (most recent call last):
    File “deep_learning_object_detection.py”, line 32, in
    net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
    AttributeError: ‘module’ object has no attribute ‘dnn’

    Eventhough after installing Lasagne , it is giving me the error :
    ImportError: Could not import Theano.

    Please make sure you install a recent enough version of Theano. See
    section ‘Install from PyPI’ in the installation docs for more details:

    • Adrian Rosebrock September 11, 2017 at 2:26 pm #

      Hi Vasanth — you need to install OpenCV 3.3 for this tutorial to work. Lasange and Theano are not needed and you can safely skip them.

      • David Crawley September 23, 2017 at 2:52 pm #

        Is there any way to make this work with OpenCV 3.2 – I am trying to make this work with ROS (Robot operating system) but this only incorporated OpenCV 3.2. AM I SOL don’t go there territory or is there a way?

        • Adrian Rosebrock September 23, 2017 at 3:02 pm #

          Hey David — I wish I had better news for you. The dnn module was completely and entirely overhauled in OpenCV 3.3. Without OpenCV 3.3 you will not have the new dnn module and therefore you cannot apply object detection with deep learning and OpenCV.

          Again, I hate to be the bearer of bad news.

          • Rodrigo Passos September 26, 2017 at 8:34 pm #

            I upgraded to 3.3.0:
            pip install –upgrade opencv-python
            or python -m pip install –upgrade opencv-python

          • Adrian Rosebrock September 28, 2017 at 9:21 am #

            Be careful when doing this — you’ll be missing out on additional libraries and you may not have GUI support.

  4. andrew September 11, 2017 at 1:19 pm #

    Great post, It makes me even more excited for your deep learning book

    • Adrian Rosebrock September 11, 2017 at 2:24 pm #

      Thanks Andrew — I’ll be sharing how to train your own custom object detector inside Deep Learning for Computer Vision with Python as well.

      • Ebraheem September 25, 2017 at 4:40 pm #

        Hi Adrian,
        this might be very interesting when do you think train custom object tutorial will be shared ?

        Thanks alot for what you doing for us!

        • Adrian Rosebrock September 26, 2017 at 8:15 am #

          As I mentioned in the previous comment, I’ll be covering how to train custom object detectors inside the ImageNet Bundle of Deep Learning for Computer Vision with Python.

          • Ebraheem Saleh September 26, 2017 at 8:48 am #

            i’m interested to buy this bundle,
            when it will be released ?
            if i pre-ordered now , when i should recieve all materials ?


          • Adrian Rosebrock September 28, 2017 at 9:30 am #

            You would want to buy the ImageNet Bundle as that is where I’ll be covering object detection methods in detail. The chapters inside the ImageNet Bundle will be released in October 2017.

  5. aditya September 11, 2017 at 1:32 pm #

    Can you please provide the dataset link and the train.py file
    i want to manually train it and check it…
    So please provide the dataset name or downloading link and the program to train the model…

    • Adrian Rosebrock September 11, 2017 at 2:25 pm #

      Hi Aditya — as I mentioned in the tutorial this object detector is pre-trained via the Caffe framework. I’ll be discussing hwo to create your own custom object detectors inside Deep Learning for Computer Vision with Python.

  6. Sydney September 11, 2017 at 3:56 pm #

    Nice tutorial. Can i please have the video implementation of the object detection method. The challenge i am facing is of the model using up all my resources for inference and i am sure this method goes a long way in ensuring efficient resource usage during inference.

    • Adrian Rosebrock September 11, 2017 at 4:08 pm #

      I will be sharing the video implementation of the deep learning object detection algorithm on Monday, September 18th. Be sure to keep an eye on your inbox as I’ll be announcing the tutorial via email.

      • Sydney September 12, 2017 at 3:40 am #

        Thanks a lot man

  7. Terry September 11, 2017 at 6:23 pm #

    God send you to save my life. I struggled for months about the performance issue with yolov2. It’s just too heavy for cpu and mobile devices.

  8. Hilman September 11, 2017 at 6:35 pm #

    Adrian, I am glad there is someone like you in this CV/ML community!
    Keep up the high quality contents!

    • Adrian Rosebrock September 12, 2017 at 7:18 am #

      Thanks Hilman!

  9. Chris Albertson September 11, 2017 at 6:36 pm #

    I’m still trying to understand how an image classifier cold be incorporated into a larger network for find bounding boxes. I thought about searching a tree of cropped images buy that would be interactive and slow.

    I looks like this article took the black-box approach. How to detect objects? Make a call to an object detector. That’s easy but how does the object detector work?

    How can an object classifier like vgg16 be used for deception without iteration

    • Adrian Rosebrock September 12, 2017 at 7:18 am #

      Traditional object detection is accomplished using a sliding window an image pyramid, like in Histogram of Oriented Gradients. Deep learning-based object detectors do end-to-end object detection. The actual inner workings of how SSD/Faster R-CNN work are outside the context of this post, but the gist is that you can divide an image into a grid, classify each grid, and then adjust the anchors of the grid to better fit the object. This is a huge simplification but it should help point you in the right direction.

  10. Barbara September 11, 2017 at 7:39 pm #

    Hi Adrian, how can I edit your code to only detect person? The others shapes aren’t necessary for me. And thank you so much for your tutorial, it helps a lot

    • Adrian Rosebrock September 12, 2017 at 7:16 am #

      The “person” class is the 14th index in CLASSES and therefore the returned detections as well. You can remove the for loop that loops over the detections and then just check the probability associated with the person class:

      • Barbara September 12, 2017 at 12:29 pm #

        Thank you so much. You have no idea of how much your tutorials help

      • Barbara September 12, 2017 at 1:01 pm #

        It didn’t work. the detections return only the shapes that were detected. if I had only 2 shapes in my image, the for loop will repeat twice, then integration would be 0 and 1 and not the whole CLASSES. So, your answer is wrong. I’ve tried it. But I can’t find a way of detecting only human shape.

        • Adrian Rosebrock September 12, 2017 at 2:06 pm #

          Try this:

          You’ll want to double-check that the idx is indeed 14.

          • Barbara September 12, 2017 at 2:40 pm #

            That’s is exactly what I tried, but it’s 15 for “person”. You said in other comment that you’d be sharing the video implementation on Monday. I already did that following the instructions here and others about video. But, it takes around 17 s between frames (between processing a frame and another). Do you know what I could do to decrease this time?

          • Adrian Rosebrock September 12, 2017 at 6:05 pm #

            Hi Barbara — unfortunately without knowing more about your setup I’m not sure what the issue is. I would kindly ask you to please wait until the video tutorial is released on Monday, September 18th. There are additional optimizations that you may not be considering such as reducing frame size, using threading to speedup the frames per second rate, etc.

  11. siam September 12, 2017 at 3:12 am #

    after running that code i found that error:argument -i/–image is required
    How can I fix it?
    I am using windows 10, and python 2.7

    • Adrian Rosebrock September 12, 2017 at 7:14 am #

      Hi Siam — you are not providing the --image command line argument. Please (1) see my examples of executing the script in this tutorial and (2) read up on command line arguments.

  12. Alexander September 12, 2017 at 7:12 am #

    Thank you, Adrian. Very useful theme with interest explanation.

    • Adrian Rosebrock September 12, 2017 at 7:19 am #

      I’m happy you found it helpful, Alexander! It’s my pleasure to share.

  13. Jose fernando September 12, 2017 at 1:09 pm #

    hello adrian I am from Colombia you would recommend using linux for a higher performance or no problem if you use windows Thanks

    • Adrian Rosebrock September 12, 2017 at 2:06 pm #

      I would definitely recommend using Linux for deep learning environments. macOS is a good fallback or if you’re just playing around and learning fundamentals. I would not recommend Windows.

  14. Thimira Amaratunga September 13, 2017 at 12:16 pm #

    Hi Adrian,

    Is it possible to use a pre-trained TensorFlow model with OpenCV 3.3 as a custom object detector? Or does it only work with Caffe?


    • Adrian Rosebrock September 13, 2017 at 2:53 pm #

      You can use a pre-trained TensorFlow model. Please see my reply to “Sydney”.

  15. Walid Ahmed September 13, 2017 at 1:41 pm #

    Thanks a lot

    your simple illustration for complex new issues is highly appreciated,

    • Adrian Rosebrock September 13, 2017 at 2:52 pm #

      Thanks Walid, I’m happy that you enjoyed the tutorial! 🙂

  16. Sydney September 13, 2017 at 2:21 pm #

    Hie man. How can i use a tensorflow .pb model file instead of he caffee model?

    • Adrian Rosebrock September 13, 2017 at 2:52 pm #

      Please see this blog post where I list out the TensorFlow functions for OpenCV.

  17. Flávio Rodrigues September 13, 2017 at 3:25 pm #

    Hi, Adrian. Have you tried the original TensorFlow Model to compare with the Caffe version? Do you plan to do such tests and show on your blog how to use a pre-trained model with differentt Network architectures? Thanks a lot for your great posts. It encourages me even more to buy your books, and I hope I will!

    • Adrian Rosebrock September 13, 2017 at 3:35 pm #

      I personally haven’t benchmarked the original TensorFlow model compared to the Caffe one; however, the author of the TensorFlow did benchmark them. They share their benchmarks here and note the differences in implementation.

      I’ve already covered how to use GoogLeNet and now MobileNet in this post. I’ll cover more networks in the future. Otherwise, for a detailed review of other state-of-the-art architectures (and how to implement them) I would definitely refer you to Deep Learning for Computer Vision with Python.

      • Flávio Rodrigues September 13, 2017 at 3:54 pm #

        Thanks a lot, Adrian. And I have just watched your new real-time object detection video on YouTube. Oh, man, stop blowing my mind! Hahaha. I can’t wait to see the blog post. And thank you for always answering our questions. You must be a super organized person to do that on such a busy schedule. Cheers.

        • Adrian Rosebrock September 14, 2017 at 6:33 am #

          Thanks Flávio, it’s my pleasure to help 🙂

  18. Alan Federman September 14, 2017 at 12:54 pm #

    Traceback (most recent call last):
    File “deep_learning_object_detection.py”, line 32, in
    net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
    AttributeError: ‘module’ object has no attribute ‘dnn’

    I missed a step somewhere.

    • Adrian Rosebrock September 14, 2017 at 1:13 pm #

      Hi Alan — it looks like you do not have OpenCV 3.3 installed. Please ensure OpenCV 3.3 has been installed on your system.

  19. Gilad September 15, 2017 at 3:48 am #

    Hi Adrian,
    I tried to combine this code with your previous code which uses googlenet, but found out that the forward procedure doesn’t support localization.
    If I don’t care about the computation timing and would like to have much more classes with localization, what should I do?

    • Adrian Rosebrock September 18, 2017 at 2:16 pm #

      Unfortunately in that case you would need to train your own custom object detector to on the actual ImageNet dataset so you can localize the 1,000 specific categories rather than the 20 that this network was trained on.

  20. Scott Stoltzman September 15, 2017 at 3:41 pm #

    Is there a list out there of the different “classes” that can be detected? I have searched extensively and can’t find anything. My guess is that there are A LOT of them.

    • Adrian Rosebrock September 18, 2017 at 2:14 pm #

      Hi Scott — please see this blog post, specifically Lines 20-23. The CLASSES list provides the list of classes that can be detected using this pre-trained network.

  21. Zaira Zafar September 17, 2017 at 12:50 pm #

    Hi adrian,

    Ran your code and honestly it’s amazing. Superb! The models are soo well trained, and code so clean and well to read.

    Can I measure distance b/w the detected objects? using your previous blog:


    • Adrian Rosebrock September 18, 2017 at 2:04 pm #

      Yes, just be sure to perform the calibration step via the triangle similarity (as discussed in the “Measuring distance between objects in an image” post you linked to).

  22. computernut September 17, 2017 at 2:54 pm #

    Have you had a chance to look at the Neural network on a stick from Modivus? (developer dot movidius dot com/ ) Do you believe if it holds promise for this sort of application, where small and faster computation is more the need than the crunching power of say the Nvidiai Tesla machines?

    • Adrian Rosebrock September 18, 2017 at 2:03 pm #

      It really depends on how well Intel documents the Movidius stick (Intel isn’t known for their documentation). The Movidius is really only meant for deploying networks, not training.

      • Flávio Rodrigues September 19, 2017 at 4:34 pm #

        Hi, Adrian. Maybe it’s something worth to give it a try. The stick is not that expensive and appears to increase the frame rate substanttialy on a Pi 2 or 3. I’m waiting for your post about real-time object detection on a Pi, but I’m afraid that it doesn’t work so well. I have seen these two videos (https://www.youtube.com/watch?time_continue=4&v=f39NFuZAj6s ; https://www.youtube.com/watch?v=41E5hni786Y) and i’m wondering how would it be using such pre-trained Caffe models running on Movidius NCS with a Raspberry Pi and OpenCV. It would be awesome! Have you ever thought about exploring it?

        • Adrian Rosebrock September 20, 2017 at 7:03 am #

          I’ve mentioned the Movidius in a handful of comments in other blog posts. The success of the Movidius is going to depend a lot on Intel’s documentation which is not something they are known for. I’ll likely play around with it in the future, but it’s primarily used for deploying pre-trained networks rather than training them. Again, it’s something that I need to give more thought to.

  23. denish September 19, 2017 at 5:02 am #

    How to install OpenCV-3.3
    please help me

  24. denish September 20, 2017 at 3:26 am #

    how to install OpenCV3.3

  25. rmb September 20, 2017 at 5:58 pm #

    Your tutorials are really excellent! You get the impression that everything is so simple.

    On the basis of your code, which works perfectly, I would now like to identify (car / van / small trucks / large trucks).

    As you suggested, I looked into the Caffe Model Zoo. I tried to use GoogLeNet_cars by retrieving directly .model (http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/googlenet_finetune_web_car_iter_10000.caffemodel)

    And the corresponding prototxt (https://gist.github.com/bogger/b90eb88e31cd745525ae#file-deploy-prototxt)

    But simply changing the model does seem to be the right way to go. What should I do? … Yes I completely discover the subject.

    Thanks in advance.

    • Adrian Rosebrock September 23, 2017 at 10:18 am #

      You can use pre-trained models to detect objects in images; however, these pre-trained models must be object detectors. The GoogLeNet model is not an object detector. It’s an image classifier. The version of GoogLeNet you supplied cannot be used for object detection (just image classification).

      I hope that helps!

  26. Gerardo September 25, 2017 at 12:11 am #

    Interestingly, running your code on my machine gives different object detection results than yours. For instance, on example 3, I can only detect the horse and one potted plant. On example 5 I get the same detection plus the dog is also detected as a cat (with a higher probability) and the model is able to capture the person in the back, left side near the fence.

    Is this variation expected? I would have expected that the dnn model would behave the same on an the same image for all repetitions of the experiment.

    thanks for the great post!

    • Adrian Rosebrock September 26, 2017 at 8:31 am #

      There will be a very tiny bit of variation depending on your version of OpenCV, optimization libraries, system dependencies, etc.; however, I would not expect results to vary as much as you are seeing. What OS and versions of libraries are you running?

      • Peter October 22, 2017 at 8:22 am #

        Hi Adrian, I got the same result as Geraro and feel confused. there is a probability for cat with higher probability but without the box for it

        … terminal output removed to formatting …

        • Adrian Rosebrock October 22, 2017 at 8:51 am #

          Hi Peter, thanks for the comment. I’m honestly not sure what the problem is here. I have not run into this issue personally and I’m not sure what the problem/solution is. I will continue to look into it.

      • Peter October 22, 2017 at 8:23 am #

        Python 3.5.2, opencv 3.3.0, Ubuntu 16.04

  27. Ravi Teja September 25, 2017 at 2:24 pm #

    Hi Adrian,

    Thanks for writing wonderful tutorials. What is the best place to learn about all functions inside OpenCV module and Tensorflow deep learning modules? For understanding your code, I feel i should brushup these things first, I can better understand your code.

    • Adrian Rosebrock September 26, 2017 at 8:17 am #

      Can you elaborate on what you mean by “all functions”? If you wanted to learn about “all functions” you would read through the documentation for OpenCV and TensorFlow.

      However, I don’t think this is a very good way to learn. Instead, you should go through Practical Python and OpenCV and Deep Learning for Computer Vision with Python which teaches you how to use these functions to solve actual problems.

      Reading the documentation can be helpful to clarify the parameters to a function, but it’s not a very good way to practically learn the techniques.

  28. Mandeep September 25, 2017 at 6:46 pm #

    How do I run the final command on windows?

  29. Zig September 26, 2017 at 7:05 pm #


    I’m getting the following error when trying to run your code:

    [INFO] loading model….

    Can’t open “MobileNetSSD_deploy.prototxt.txt” in function ReadProtoFromTextFile

    Any idea what this could be? OpenCV 3.3, Python 3.6 (same error on 2.7). Similar error is produced when I change the model or prototxt.


    • Adrian Rosebrock September 28, 2017 at 9:24 am #

      Please see my reply to “zhang xue” and confirm whether you’ve used the “Downloads” sections of this post to download the pre-trained model files.

  30. Aniket September 26, 2017 at 10:50 pm #

    Hi Adrian,

    I have come across some problems when understanding your code:

    In this line,


    what does this line means when the blob is forward pass through the network in the line “net.forward”?

    In this line,

    confidence = detections[0, 0, i, 2]

    what are these 4 parameters(0,0,i,2) means and how it extracts the confidence of the object detected?

    In this line,

    idx = int(detections[0, 0, i, 1])

    what is this 1 signifies in detections[ ]?

    In this line,

    box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])

    what do you want to do by multiplying numpy array with detections? Why you take 4th argument of detections[ ] as 3:7, what does this mean? Why you pass [w, h, w, h] to numpy array and why you pass width and height two times to numpy array?

    Please help, thanks in advance.

    • Adrian Rosebrock September 27, 2017 at 6:43 am #

      The detections object is a mulit-dimensional NumPy array. The call to detections.shape gives us the number of actual detections. We can then extract the confidence for the i-th detection via detections[0, 0, i, 2]. The slice 3:7 gives us the bounding box coordinates of the object that was detected. We need to multiply these coordinates by the image width and height as they were relatively scaled by the SSD.

      Take a look at the detections NumPy array and play around with it. If you’re new to NumPy, take the time to educate yourself on how array slices work and how vector multiplies work. This will help you learn more.

  31. Zig September 27, 2017 at 3:05 am #

    Hi Adrian,

    Just to make sure I’m understanding what is going on here. SSD is an object detector that sits on top of an image classifier (in this case MobileNet). So, technically, one can switch to a more accurate (but slower) image classifier such as Inception. And this would improve the detection results of SSD. Is this correct? I guess I can look at your other posts about using Google LeNet and change a few lines in this example to switch MobileNet with Google LeNet in OpenCV?

    Also, have you come across any implementations or blog posts that discuss playing around with various image classifiers + SSD in Keras to perform object detection?

    Thanks once again for your blog posts. They have saved me hours and hours of time and the hair on my head.


    • Adrian Rosebrock September 28, 2017 at 9:20 am #

      This is a bit incorrect. In the SSD architecture, the bounding boxes and confidences for multiple categories are predicted directly within a single network. We can modify an existing network architecture to fit the SSD framework and then train it to recognize objects, but they are not hot swappable.

      For example, the base of the network could be VGG or ResNet through the final pooling layers. We then convert the FC layers to CNV layers. Additional layers are then used to perform the object detection. The loss function then minimizes over correct classifications and detections. A complete review of the SSD framework is outside the scope of this post, but I will be covering it in detail inside Deep Learning for Computer Vision with Python.

      There are one or two implementations I’ve seen of SSDs in Keras and mxnet, but from what I understand they are a bit buggy.

      • Zig September 28, 2017 at 8:45 pm #

        Will the ImageNet Bundle of “Deep Learning for Computer Vision with Python” cover code (at least to some extent) to play around with object detectors and image classifiers, like I asked in my first post? There’s plenty of stuff on the net to train image classifiers but not much if one wants to couple object detection with everything. Cheers. (Oh, and when will the review of SSD and everything related be available for reading and exploring in your book?)

        • Adrian Rosebrock October 2, 2017 at 10:24 am #

          Yes, you are absolutely correct. the ImageNet Bundle of Deep Learning for Computer Vision with Python will demonstrate how to train your own custom object detectors using deep learning. From there I’ll also demonstrate how to create a custom image processing pipeline that will enable you to take an input image and obtain the output predictions + detections using your classifier.

          Secondly, I will be reviewing SSD inside the ImageNet Bundle. I won’t be demonstrating how to implement it, but I will be discussing how it works and demonstrating how to use it.

  32. Justice September 27, 2017 at 7:34 am #

    Hi, I was wondering if I would be able to only detect fruits and vegetables and differentiate the different types?

    • Adrian Rosebrock September 27, 2017 at 7:43 am #

      Using the pre-trained network, no. You can only detect objects that the network was already trained to recognize.

      If you want to recognize custom objects (such as fruits and vegetables) you’ll need to either (1) train a new network from scratch or (2) apply transfer learning, such as fine-tuning.

  33. Justice September 27, 2017 at 6:02 pm #

    Would you be able to send any helpful tools or links that would help me start the train the network from scratch?.

  34. zhang xue September 27, 2017 at 11:08 pm #

    Traceback (most recent call last):
    File “deep_learning_with_opencv.py”, line 34, in
    net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
    cv2.error: /home/ubuntu/opencv-3.3.0/modules/dnn/src/caffe/caffe_io.cpp:1113: error: (-2) FAILED: fs.is_open(). Can’t open “MobileNetSSD_deploy.prototxt.txt” in function ReadProtoFromTextFile

    how to solve it?thanks

    • Adrian Rosebrock September 28, 2017 at 9:23 am #

      Just to clarify, have you used the “Downloads” section of this blog post to download the source code + pre-trained Caffe model and prototxt files?

      • Jason October 12, 2017 at 9:52 am #

        i have the same problem here. code, model and prototxt is from your site!
        ubuntu 16.04

        • Adrian Rosebrock October 13, 2017 at 8:41 am #

          Hi Jason — thanks for the comment. I’ve seen a handful of readers run into this problem. Unfortunately I have not been able to replicate it. It would be a big help to me and the rest of the PyImageSearch community could help to replicate this error.

  35. pavi111 September 29, 2017 at 2:12 pm #

    what algorithm you used detect object in image or can you please links for research paper , other code for object detection from image in which i can train my own images as it will be covered in your book you told but for now i need a reference as a part of my project….
    so i would be glad if you can share github link for thr object detection code with train.py file.

    Thanks your tutorials are too good….

    • Adrian Rosebrock October 2, 2017 at 10:09 am #

      I cover various object detection methods inside the PyImageSearch Gurus course, including links to various academic papers. I suggest you start there.

  36. Aniket September 30, 2017 at 12:35 pm #

    Hello Adrian,

    I want to play with this code on my pc which is windows 7 64-bit. On my machine, I still don’t yet have opencv installed and even I don’t know about which configuration(working environment) should I have in order to run this code. I even don’t know how to install opencv on my pc so that this code will run, please help…..

    • Adrian Rosebrock October 2, 2017 at 9:53 am #

      Hi Aniket — if you are interested in studying computer vision and deep learning I would recommend that you use either Linux or macOS. Windows is not recommended for deep learning or computer vision. I demonstrate how to configure Ubuntu for deep learning and macOS for deep learning.

      Otherwise, I offer a pre-configured Ubuntu VirtualBox virtual machine as part of my book, Deep Learning for Computer Vision with Python.

      This VM will run on Windows, macOS, and Linux and is by far the fastest way to get up and running with deep learning and OpenCV.

      I hope that helps!

  37. Alejandro Amar October 1, 2017 at 12:52 pm #

    Hi Adrian, caffe 2 models are used for OpenCv or only Caffe.

  38. JohnZ October 1, 2017 at 1:38 pm #

    Hi Adrian,
    first of all, thanks for this great tutorial!

    I have a short question: I am trying to rebuilt your tutorial with the openCV C++ API. When I see the call for the function for the blog generation from the input image:

    cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)

    it is hard for me to match it up with the corresponding C++ API function

    Could you give me a small hint how to match it? Escpecially the scalar value “0.007843” and “127.5” did not realy match for me.

    Thanks for you help and again great work!

    • Adrian Rosebrock October 2, 2017 at 9:39 am #

      I’ll actually be doing a tutorial that details every parameter of the cv2.dnn.blobFromImage in the next few weeks. In the meantime, 127.5 is the mean subtraction value and 0.007843 is your normalization factor.

      • JohnZ October 4, 2017 at 1:24 pm #

        Hi Adrian, thanks for your fast reply.

        Ok, is this a special function you are using? I am currently using openCV 3.3 from august this year. Actually, I do not understand yet how the normalization factor fits to the current API. There is the mean value which gets subtracted from each color channel and parameters for the target size of the image. And finally a boolean flag to swap the red and green channels.

        Could you give me a hint please?

        • Adrian Rosebrock October 6, 2017 at 5:15 pm #

          You are correct. The mean value is computed across the training set and then subtracted from each channel of the image. You can also optionally supply a 3-tuple if you have different RGB values (which in most cases you do). Once you perform the mean subtraction you multiply by the scaling value.

          • JohnZ October 7, 2017 at 9:46 am #

            Ok, thanks for the hint. Sorry for bothering you again but would be this call correct?

            cv::Mat inputBlob = cv::blobFromImage(img, 0.007843, cv::Size(300, 300), cv::Scalar(127.5));

            Again, thank you for your great tutorials!

          • Adrian Rosebrock October 9, 2017 at 12:36 pm #

            I have only used the Python bindings of the “dnn” module, not the C++ ones. It looks like your call is correct, but again, you should compile your code and try it.

  39. Nihit October 6, 2017 at 6:07 am #

    I was trying to replicate your results of example 3. In my case only the horse and potted plants were getting detected and not the person. Either I had to remove the mean (127.5) from blobFromImage or resize to 400×400 to get person detected. Do you know why so ?

    • Adrian Rosebrock October 6, 2017 at 4:51 pm #

      Hi Nihit — that is indeed strange; however, I’m not sure why that would be. Did you use the “Downloads” section of the post to use the same code, pre-trained network, and example images that I used?

      • Nihit October 9, 2017 at 12:34 am #

        Yes I downloaded the code,examples and model from the ‘Downloads’ section

        • Adrian Rosebrock October 9, 2017 at 12:17 pm #

          Thank you for sharing the additional details, Nihit! Unfortunately I’m not sure what the exact issue is here. I wish I could help more, but without physical access to your machine to diagnose any library issues, I’m not sure what the problem may be.

  40. Jes October 6, 2017 at 8:06 am #

    Hi! Thanks for the clear tutorial, really makes difference in trying to figure this stuff out!
    This is what I don’t get about how the dnn works (I’m a newbie with the object detection so :D):
    how does the model go through the blob to get the location? I mean, if the object recognition model is (presumably) trained with the object nicely framed in the middle of the image, how does the detection model find a small or partially covered object like the baseball glove? Does it somehow divide the image in seqments?

    • Adrian Rosebrock October 6, 2017 at 4:50 pm #

      The model is not trained with images that have the objects nicely framed in the center of the image. Instead, images are provided with plaintext bounding boxes that indicate where in the image the object is. The SSD then learns patterns in the input images that correspond to the class labels while simultaneously adjusting the predicted bounding boxes.

      If you’re new to computer vision and object detection be sure to read this post on the fundamentals on more traditional object detectors.

  41. Mustafa Demir October 13, 2017 at 4:48 am #

    Hi, thanks for this post but I have a problem.
    error: AttributeError: module ‘cv2.cv2’ has no attribute ‘dnn’

    • Adrian Rosebrock October 13, 2017 at 8:34 am #

      So this is either (1) a typo or (2) you haven’t installed OpenCV 3.3.

      The correct call is cv2.dnn., not cv2.cv2.dnn.

      Secondly, please ensure you have installed OpenCV 3.3 on your system.

  42. Emy October 15, 2017 at 7:29 am #

    Hi, thanks for this post but I have a problem.
    after running that code i found that error:argument -i/–image is required
    How can I fix it?

    • Adrian Rosebrock October 16, 2017 at 12:29 pm #

      Please see my reply to “siam” above.

  43. Paul Kuo October 16, 2017 at 12:03 am #

    Hi, Adrian,

    say that I have a GPU card fitted in my machine, would opencv dnn module utilizes it to speed up the detection and how would it do it? Thanks ~~

    • Adrian Rosebrock October 16, 2017 at 12:21 pm #

      As far as I understand, Python cannot access the GPU-bindings for OpenCV. I would suggest taking a look at the C++ API of OpenCV.

      • Paul Kuo October 17, 2017 at 2:22 am #

        Cool, thank you for your suggestion. As my projects are all developed with C++ openCV APIs, this will be easier for me if the opencv C++ APIs could access the GPU-bindings.

        Also I am looking forward to your next post regarding object detection on a video stream~~



        • Adrian Rosebrock October 17, 2017 at 9:32 am #

          Hi Paul — the object detection in video stream post you are referring to was actually published on September 18th. You can find it here.

  44. Prabhat Kumar Prabhakar October 16, 2017 at 7:47 am #

    Traceback (most recent call last):
    File “real_time_object_detection.py”, line 33, in
    net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
    AttributeError: ‘module’ object has no attribute ‘dnn’

    this is the issue i am getting while running the real_time_object_detection.py file
    is there anything wrong with my opencv installation??

    I installed it from the link provided by you, i din run into any issue , while installation, however while running the real_time_object_detection.py file i get the above error.

    Please help, anyone if came across such issue

    • Adrian Rosebrock October 16, 2017 at 12:16 pm #

      Please read the comments before posting. I’ve already addressed this issue multiple times. Take a look at my reply to “Vasanth”. Please ensure you have properly installed OpenCV 3.3.

  45. Satyam October 21, 2017 at 12:34 am #

    Hey Adrian,

    Thank you so much for making such great tutorials.Just wanted to know the way to train the model for a huge database (for more objects other than listed in the classes).

    Thank you!


  1. Real-time object detection with deep learning and OpenCV - PyImageSearch - September 18, 2017

    […] was inspired by PyImageSearch reader, Emmanuel. Emmanuel emailed me after last week’s tutorial on object detection with deep learning + OpenCV and […]

Leave a Reply