A gentle guide to deep learning object detection

Today’s blog post is inspired by PyImageSearch reader Ezekiel, who emailed me last week and asked:

Hey Adrian,

I went through your previous blog post on deep learning object detection along
with the followup tutorial for real-time deep learning object detection. Thanks for those.

I’ve been using your source code in my example projects but I’m having two issues:

  1. How do I filter/ignore classes that I am uninterested in?
  2. How can I add new classes to my object detector? Is that even possible?

I would really appreciate it if you could cover this in a blog post.


Ezekiel isn’t the only reader with those questions. In fact, if you go through the comments section of my two most recent posts on deep learning object detection (linked above), you’ll find that one of the most common questions is typically (paraphrased):

How do I modify your source code to include my own object classes?

Since this appears to be such a common question, and ultimately a misunderstanding on how neural networks/deep learning object detectors actually work, I decided to revisit the topic of deep learning object detection in today’s blog post.

Specifically, in this post you will learn:

  • The differences between image classification and object detection
  • The components of a deep learning object detector including the differences between an object detection framework and the base model itself
  • How to perform deep learning object detection with a pre-trained model
  • How you can filter and ignore predicted classes from a deep learning model
  • Common misconceptions and misunderstandings when adding or removing classes from a deep neural network

To learn more about deep learning object detections, and perhaps even debunk a few misconceptions or misunderstandings you may have with deep learning-based object detection, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

A gentle guide to deep learning object detection

Today’s blog post is meant to be a gentle introduction to deep learning-based object detection.

I’ve done my best to provide a review of the components of deep learning object detectors, including OpenCV + Python source code to perform deep learning using a pre-trained object detector.

Use this guide to help you get started with deep learning object detection, but also realize that the object detection is highly nuanced and detailed — I could not possibly include every detail of deep learning object detection in a single blog post.

That said, we’ll start today’s blog post by discussing the fundamental differences between image classification and object detection, including if a network trained for image classification can be used for object detection (and under what circumstances).

Once we understand what object detection is, we’ll review the core components of a deep learning object detector, including the object detection framework along with the base model, two key components that readers new to object detection tend to misunderstand.

From there, we’ll implement real-time deep learning object detection using OpenCV.

I’ll also demonstrate how you can ignore and filter object classes you are not interested in without having to modify the network architecture or retrain the model.

Finally, we’ll wrap up today’s blog post by discussing how you can add or remove classes from a deep learning object detector, including my recommended resources to help you get started.

Let’s go ahead and dive in to deep learning object detection!

The difference between image classification and object detection

Figure 1: The difference between classification (left) and object detection (right) is intuitive and straightforward. For image classification, the entire image is classified with a single label. In the case of object detection, our neural network localizes (potentially multiple) objects within the image.

When performing standard image classification, given an input image, we present it to our neural network, and we obtain a single class label and perhaps a probability associated with the class label as well.

This class label is meant to characterize the contents of the entire image, or at least the most dominant, visible contents of the image.

For example, given the input image in Figure 1 above (left) our CNN has labeled the image as “beagle”.

We can thus think of image classification as:

  • One image in
  • And one class label out

Object detection, regardless of whether performed via deep learning or other computer vision techniques, builds on image classification and seeks to localize exactly where in the image each object appears.

When performing object detection, given an input image, we wish to obtain:

  • A list of bounding boxes, or the (x, y)-coordinates for each object in an image
  • The class label associated with each bounding box
  • The probability/confidence score associated with each bounding box and class label

Figure 1 (right) demonstrates an example of performing deep learning object detection. Notice how both the person and the dog are localized with their bounding boxes and class labels predicted.

Therefore, object detection allows us to:

  • Present one image to the network
  • And obtain multiple bounding boxes and class labels out

Can a deep learning image classifier be used for object detection?

Figure 2: A non-end-to-end deep learning object detector uses a sliding window (left) + image pyramid (right) approach combined with classification.

Okay, so at this point you understand the fundamental difference between image classification and object detection:

  • When performing image classification, we present one input image to the network and obtain one class label out.
  • But when performing object detection, we can present one input image and obtain multiple bounding boxes and class labels out.

That motivates the question:

Can we take a network already trained for classification and use it for object detection instead?

The answer is a bit tricky as it’s technically “Yes”, but for reasons not so obvious.

The solutions involve:

  1. Applying standard, computer-vision based object detection methods (i.e., non-deep learning methods) such as sliding windows and image pyramids — this method is typically used in your HOG + Linear SVM-based object detectors.
  2. Taking the pre-trained network and using it as a base network in a deep learning object detection framework (i.e., Faster R-CNN, SSD, YOLO).

Method #1: The traditional object detection pipeline

The first method is not a pure end-to-end deep learning object detector.

We instead utilize:

  1. Fixed size sliding windows, which slide from left-to-right and top-to-bottom to localize objects at different locations
  2. An image pyramid to detect objects at varying scales
  3. Classification via a pre-trained (classification) Convolutional Neural Network

At each stop of the sliding window + image pyramid, we extract the ROI, feed it into a CNN, and obtain the output classification for the ROI.

If the classification probability of label L is higher than some threshold T, we mark the bounding box of the ROI as the label (L). Repeating this process for every stop of the sliding window and image pyramid, we obtain the output object detectors. Finally, we apply non-maxima suppression to the bounding boxes yielding our final output detections:

Figure 3: Applying non-maxima suppression will suppress overlapping, less confident bounding boxes.

This method can work in some specific use cases, but in general it’s slow, tedious, and a bit error-prone.

However, it’s worth learning how to apply this method as it can turn an arbitrary image classification network into an object detector, avoiding the need to explicitly train an end-to-end deep learning object detector. This method could save you a ton of time and effort depending on your use case.

If you’re interested in this object detection method and want to learn more about the sliding window + image pyramid + image classification approach to object detection, please refer to my book, Deep Learning for Computer Vision with Python.

Method #2: Base network of an object detection framework

The second method to deep learning object detection allows you to treat your pre-trained classification network as a base network in a deep learning object detection framework (such as Faster R-CNN, SSD, or YOLO).

The benefit here is that you can create a complete end-to-end deep learning-based object detector.

The downside is that it requires a bit of intimate knowledge on how deep learning object detectors work — we’ll discuss this more in the following section.

The components of a deep learning object detector

Figure 4: The VGG16 base network is a component of the SSD deep learning object detection framework.

There are many components, sub-components, and sub-sub-components of a deep learning object detector, but the two we are going to focus on today are the two that most readers new to deep learning object detection often confuse:

  1. The object detection framework (ex. Faster R-CNN, SSD, YOLO).
  2. The base network which fits into the object detection framework.

The base network you are likely already familiar with (you just haven’t heard it referenced as a “base network” before).

Base networks are your common (classification) CNN architectures, including:

  • VGGNet
  • ResNet
  • MobileNet
  • DenseNet

Typically these networks are pre-trained to perform classification on a large image dataset, such as ImageNet, to learn a rich set of discerning, discriminating filters.

Object detection frameworks consist of many components and sub-components.

For example, the Faster R-CNN framework includes:

  • The Region Proposal Network (RPN)
  • A set of anchors
  • The Region of Interest (ROI) pooling module
  • The final Region-based Convolutional Neural Network

When using Single Shot Detectors (SSDs) you have components and sub-components such as:

  • MultiBox
  • Priors
  • Fixed priors

Keep in mind that the base network is just one of the many components that fit into the overall deep learning object detection framework — Figure 4 at the top of this section depicts the VGG16 base network inside the SSD framework.

Typically, “network surgery” is performed on the base network. This modification:

  • Forms it to be fully-convolutional (i.e., accept arbitrary input dimensions).
  • Eliminates CONV/POOL layers deeper in the base network architecture and replaces them with a series of new layers (SSD), new modules (Faster R-CNN), or some combination of the two.

The term “network surgery” is a colloquial way of saying we remove some of the original layers of the base network architecture and supplant them with new layers.

You’ve likely seen low budget horror movies where the killer, likely carrying an ax or large knife, attacks their victim and unceremoniously hacks at them.

Network surgery is more precise and exacting than the typical B horror film killer.

Network surgery is also very tactical — we remove parts of the network we do not need and replace it with a new set of components.

Then, when we go to train our framework to perform object detection, both the weights of the (1) new layers/modules and (2) base network are modified.

Again, a complete review of how various deep learning object detection frameworks work (including the role the base network plays) is outside the scope of this blog post.

If you’re interested in complete review of deep learning object detection, including theory and implementation, please refer to my book, Deep Learning for Computer Vision with Python.

How do I measure the accuracy of a deep learning object detector?

When evaluating object detector performance we use an evaluation metric called mean Average Precision (mAP) which is based on the Intersection over Union (IoU) across all classes in our dataset.

Intersection over Union (IoU)

Figure 5: In this visual example of Intersection over Union (IoU), the ground-truth bounding box (green) can be compared to the predicted bounding box (red). IoU is used with mean Average Precision (mAP) to evaluate the accuracy of a deep learning object detector. The simple equation to calculate IoU is shown on the right.

You’ll typically find IoU and mAP used to evaluate the performance of HOG + Linear SVM detectors, Haar cascades, and deep learning-based methods; however, keep in mind that the actual algorithm used to generate the predicted bounding boxes does not matter.

Any algorithm that provides predicted bounding boxes (and optionally class labels) as output can be evaluated using IoU. More formally, in order to apply IoU to evaluate an arbitrary object detector, we need:

  1. The ground-truth bounding boxes (i.e., the hand-labeled bounding boxes from our testing set that specify where an image our object is).
  2. The predicted bounding boxes from our model.
  3. If you want to compute recall along with precision, you’ll also need the ground-truth class labels and predicted class labels.

In Figure 5 (left) I have included a visual example of a ground-truth bounding box (green) versus a predicted bounding box (red). Computing IoU can be determined by the equation illustration in Figure 5 (right).

Examining this equation you can see that IoU is simply a ratio.

In the numerator, we compute the area of overlap between the predicted bounding box and the ground-truth bounding box.

The denominator is the area of the union, or more simply, the area encompassed by both the predicted bounding box and the ground-truth bounding box.

Dividing the area of overlap by the area of union yields a final score — the Intersection over Union.

mean Average Precision (mAP)

Note: I decided to edit this section from its original form. I wanted to keep the discussion of mAP higher level and avoid some of the more confusing recall calculations but as a couple commenters pointed out this section wasn’t technically correct. Because of that I decided to update the post.

Since this is a gentle introduction to deep learning-based object detection I’m going to keep the explanation of mAP on the simplified side just so you understand the fundamentals.

Readers and practitioners new to object detection can be confused by the mAP calculation. This is partially due to the fact that mAP is a more complicated evaluation metric. It’s also the definition of calculation of mAP can even vary from one object detection challenge to another (when I say “object detection challenge” I’m referring to competitions such as COCO, PASCAL VOC, etc.).

Computing the Average Precision (AP) for a particular object detection pipeline is essentially a three step process:

  1. Compute the precision which is the proportion of true positives.
  2. Compute the recall which is the proportion of true positives out of all possible positives.
  3. Average together the maximum precision value across all recall levels in steps of size s.

To compute the precision we first apply our object detection algorithm to an input image. The bounding box scores are then sorted in descending order by their confidence.

We know from a priori knowledge (i.e., it’s a validation/testing example and we therefore know the total number of objects in the image) there are 4 objects in this image. We seek to determine how many “correct” detections our network made.  A “correct” prediction here is one where we have a minimum IoU of 0.5 (this value is tunable depending on the challenge but 0.5 is a standard value).

Here is where the calculation starts to become a bit more complicated. We need to compute the precision at different recall values (also called “recall levels” or “recall steps”) .

For example, let’s pretend we are computing the precision and recall values for the top-3 predictions. Out of the top-3 predictions from our deep learning object detector, we made 2 correct. Our precision is then the proportion of true positives: 2/3 = 0.667. Our recall is the proportion of the true positives out of all the possible positives in the image: 2 / 4 = 0.5. We repeat this process for (typically) the top-1 to top-10 predictions. This process yields a list of precision values.

The next step is to compute the average for all your top-N values, hence the term Average Precision (AP). We loop over all recall values r, find the maximum precisionthat we can obtain with our recall > r and then compute the average. We now have our average precision for a single evaluation image.

Once we have computed the average precision for all images in our testing/validation set we perform two more calculations:

  1. Compute the mean of the APs for each class, giving us a mAP for each individual class (for many datasets/challenges you’ll want to examine the mAP class-wise so you can spot if your deep learning object detector is struggling with a specific class)
  2. Take the mAPs for each individual class and then average them together, yielding the final mAP for the dataset

Again, mAP is more complicated than traditional accuracy so don’t be frustrated if you don’t understand it on the first pass. This is an evaluation metric you’ll want to study multiple times before you fully understand it. The good news is that deep learning object detection implementations handle computing mAP for you.

Deep learning-based object detection with OpenCV

We’ve discussed deep learning and object detection on this blog in previous posts; however, let’s review actual source code in this post as a matter of completeness.

Our example includes the Single Shot Detector (framework) with a MobileNet base model. The model was trained by GitHub user chuanqi305 on the Common Objects in Context (COCO) dataset.

For additional detail, check out my previous post where I introduced chuanqi305’s model with pertinent background information.

Let’s loop back to Ezekiel’s first question from the top of this post:

  1. How do I filter/ignore classes that I am uninterested in?

I’m going to answer that very question in the following example script.

But first you need to prepare your system:

  • You need a minimum of OpenCV 3.3 installed in your Python virtual environment (provided you are using Python virtual environments). OpenCV 3.3+ includes the DNN module required to run the following code. Be sure to use one of the OpenCV installation tutorials on the following page while paying extra attention to which version of OpenCV you download + install.
  • You should also install my imutils package. To install/update imutils in your Python virtual environment, simply use pip: pip install --upgrade imutils .

When you’re ready, go ahead and create a new file named filter_object_detection.py  and let’s begin:

On Lines 2-8 we import our required packages and modules, notably imutils  and OpenCV. We will be using my VideoStream  class to handle capturing frames from a webcam.

We’re armed with the necessary tools, so let’s continue by parsing command line arguments:

Our script requires two command line arguments at runtime:

  • --prototxt : The path to the Caffe prototxt file which defines the model definition.
  • --model : Our CNN model weights file path.

Optionally you may specify --confidence , a threshold to filter weak detections.

Our model can predict 21 object classes:

The CLASSES  list contains all class labels the network was trained on (i.e. COCO labels).

A common misconception of the CLASSES  list is that you can:

  1. Add a new class label to the list
  2. Or remove a class label from the list

…and have the network automatically “know” what you are trying to accomplish.

That is not the case.

You cannot simply modify a list of text labels and have the network automatically modify itself to learn, add, or remove patterns on data it was never trained on. That is not how neural networks work.

That said, there is a quick hack you can use to filter and ignore predictions you are uninterested in.

The solution is to:

  1. Define a set of IGNORE  labels (i.e., the list of class labels the network was trained on that you want to filter and ignore).
  2. Make a prediction on an input image/video frame.
  3. Ignore any predictions where the class label exists in the IGNORE  set.

Implemented in Python, the IGNORE  set looks like this:

Here we’ll be ignoring all predicted objects with class label "person"  (the if  statement used for filtering will be covered later in this code review).

You can easily add additional elements (class labels from the CLASSES  list) to ignore to the set.

Next, we’ll generate random label/box colors, load our model, and start the video stream:

On Line 27 a random array of COLORS  is generated to correspond to each of the 21 CLASSES . We’ll use these colors later for display purposes.

Our Caffe model is loaded on Line 31 using the cv2.dnn.readNetFromCaffe  function and both of our required command line arguments passed as parameters.

Then we instantiate the VideoStream  object as vs , and start our fps  counter (Lines 36-38). The 2-second sleep  allows our camera plenty of time to warm up.

At this point we’re ready to loop over the incoming frames from the camera and send them through our CNN object detector:

On Line 44 we grab a frame  and then resize  while preserving aspect ratio for display (Line 45).

From there, we extract the height and width as we’ll need these values later (Line 48).

Lines 48 and 49 generate a blob  from our frame. To learn more about a blob  and how it’s constructed using the cv2.dnn.blobFromImage  function, refer to this previous post for all the details.

Next, we, send that blob  through our neural net  to detect objects (Lines 54 and 55).

Let’s loop over the detections:

On Line 58 we begin our detections  loop.

For each detection, we extract the confidence  (Line 61) followed by comparing it to our confidence threshold (Line 65).

In the case that our confidence  surpasses the minimum (the default of 0.2 can be changed via the optional command line argument), we’ll consider the detection a positive, valid detection and continue processing it.

First, we extract the index of the class label from detections  (Line 68).

Then, going back to Ezekiel’s first question, we can ignore classes in the IGNORE  set on Lines 72 and 73. If the class is to be ignored, we simply continue  back to the top of the detections loop (and we don’t display labels or boxes for this class). This fulfills our “quick hack” solution.

Otherwise, we’ve detected an object in the whitelist and we need to display the class label and rectangle on the frame:

In this code block, we are extracting bounding box coordinates (Lines 77 and 78) followed by drawing a label and rectangle on the frame (Lines 81-87).

The color of the label + rectangle will be the same for each unique class; objects of the same class will have the same color (i.e. all "boats"  in the video would have the same color label and box).

Finally, still in our while  loop, we’ll display our hard work on our screen:

We display the frame  and capture keypresses on Lines 90 and 91.

If the "q"  key is pressed, we quit by breaking out of the loop (Lines 94 and 95).

Otherwise, we proceed to update our fps  counter (Line 98) and continue grabbing and processing frames.

On the remaining lines, when the loop breaks, we display time + frames per second metrics and cleanup.

Running your deep learning object detector

In order to run today’s script, you’ll need to grab the files by scrolling to the “Downloads” section below.

Once you’ve extracted the files, open a terminal and navigate to downloaded code + model. From there, execute the following command:

Figure 6: A real-time deep learning object detection demonstration of using the same model — in the right video I’ve ignored certain object classes programmatically.

In the GIF above you can see on the left that the “person” class is detected — this is due to me having an empty IGNORE . On the right you can see that I am not detected — this behavior is due to be adding the “person” class to the IGNORE  set.

While our deep learning object detector is still technically detecting the “person” class, our post-processing code is able to filter it out.

Perhaps you encountered an error running the deep learning object detector?

Troubleshooting step one would be to verify that you have a webcam hooked up. If that’s not the problem, maybe you saw the following error message in your terminal:

If you see this message, then you didn’t pass “command line arguments” to the program. This is a common problem PyImageSearch readers have if they aren’t familiar with Python, argparse, and command line arguments. Check out the link if you are having trouble.

Here is the full version of the video with commentary:

How can I add or remove classes to my deep learning object detector?

Figure 7: Fine-tuning and transfer learning for deep learning object detectors.

As I mentioned earlier in this guide, you cannot simply add or remove class labels from the CLASSES  list — the underlying network itself has not changed.

All you have done, at best, is modify a text file that lists out the class labels.

Instead, if you want to explicitly add or remove classes from a neural network you will either need to either:

  1. Train from scratch
  2. Perform fine-tuning

Training from scratch tends to be a time consuming, expensive operation so we try to avoid it when we can — but in some cases this is completely unavoidable.

The other option is to perform fine-tuning.

Fine-tuning is a form of transfer learning and is the process of:

  1. Removing the fully-connected layer responsible for classification/labeling
  2. Replacing it with a brand new, freshly and randomly initialized fully-connected layer

We may optionally modify other layers in the network as well (including freezing the weights of some layers and unfreezing them during the training process).

Exactly how to train your own custom deep learning object detector (including both fine-tuning and training from scratch) are advanced topics outside the scope of this blog post, but see the section below to help you get started.

Where can I learn more about deep learning object detection?

Figure 8: Real-time deep learning object detection for front and rear views of vehicles.

As we’ve discussed in this blog post, object detection is not as simple and straightforward as image classification, the details and intricacies of which are outside the scope of this (already lengthy) blog post.

This tutorial will certainly not be my last guide to deep learning object detection (I will unquestionably be writing more about object detection in the future), but if you’re interested in learning how to:

  1. Prepare your own image datasets for object detection
  2. Fine-tune and train your own custom object detectors, including Faster R-CNNs and SSDs on your own datasets
  3. Uncover my best practices, techniques, and procedures to utilize when training your own deep learning object detectors

…then you’ll want to be sure to take a look at my new deep learning book. Inside Deep Learning for Computer Vision with Python, I will guide you, step-by-step, on building your own deep learning object detectors.

Be sure to take a look — and don’t forget to grab your (free) sample chapters + table of contents PDF while you’re there!


In today’s blog post you were gently introduced to some of the intricacies involved in deep learning object detection. We started by reviewing the fundamental differences between image classification and object detection, including how we can use a network trained for image classification for object detection.

We then reviewed the core components of a deep learning object detector:

  1. The framework
  2. The base model

The base model is typically a pre-trained (classification) network, normally trained on a large image dataset such as ImageNet to learn a robust set of discerning filters.

We can also train the base network from scratch but this usually takes a significantly longer amount of time for the object detector to reach reasonable accuracy.

You should, in most situations, start with a pre-trained base model instead of trying to train from scratch.

Once we acquired a solid understanding of deep learning object detectors, we implemented an object detector capable of running in real-time in OpenCV.

I also demonstrated how you can filter and ignore class labels that you are uninterested in.

Finally, we learned that actually adding or removing a class to a deep learning object detector is not as simple as adding/removing a label from the hardcoded class labels list.

The neural network itself doesn’t care if you modify a list of class labels — instead, you would need to either:

  1. Modify the network architecture itself by removing the fully-connected class prediction layer and fine-tuning
  2. Or train the object detection framework from scratch

For more deep learning object detection projects you will start with a deep learning object detector pre-trained on an object detection task, such as COCO. You then perform fine-tuning on the model to obtain your own detector.

Training an end-to-end custom deep learning object detector is outside the scope of this blog post, so if you’re interested in discovering how to train your own deep learning object detectors, please refer to my book, Deep Learning for Computer Vision with Python.

Inside the book, I have included a number of deep learning object detection examples, including training your own object detectors to:

  1. Detect traffic signs, such as stop signs, pedestrian crossing signs, etc.
  2. Along with the front and rear views of vehicles

To learn more about my deep learning book, just click here!

If you enjoyed today’s blog post, be sure to enter your email address in the form below to be notified when future tutorials are published here on PyImageSearch!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

112 Responses to A gentle guide to deep learning object detection

  1. Anirban May 14, 2018 at 11:26 am #

    Really good blog post and with the youtube video , it is even better . I am really happy that I purchased your deep learning for CV with python , in a few months I have learnt so much about DL for CV that I now feel confident that I can apply for a DL in CV post.
    Disclaimer : I am a banker by profession. Have not coded in last ten years and this my honest review.

    • Adrian Rosebrock May 14, 2018 at 11:42 am #

      Thanks so much for the kind words, Anirban! 😀 I’m so incredibly happy for you and your transition from bank to CV and practitioner. Keep up the great work!

  2. Raym May 14, 2018 at 12:26 pm #

    Thanks for the clarification!!!

    • Adrian Rosebrock May 14, 2018 at 12:27 pm #

      Thanks Raym, I’m glad it helped 🙂

  3. MImranKhan May 14, 2018 at 12:47 pm #

    but how we can use our own model that we train by our self rather than picking Pre-train model

    • Adrian Rosebrock May 14, 2018 at 2:17 pm #

      You would typically take a network pre-trained on ImageNet and then fine-tune it to your own dataset. You could train your own base network first and then fine-tune but whether or not that works better really depends on your dataset and project. I would suggest running experiments for both.

  4. Vijin May 14, 2018 at 2:05 pm #

    I think mAP computation mentioned in this blog is wrong.

    • Adrian Rosebrock May 14, 2018 at 2:16 pm #

      Hey Vijin — what specifically regarding the mAP computation do you think is incorrect?

      UPDATE: I went back and updated the mAP computation. I was trying to keep it simplistic but after reading (1) Ye Hu’s comment and (2) reviewing the post itself a few times I decided to go back and include the full calculation.

  5. Tiri May 14, 2018 at 2:49 pm #

    very interesting article! hope to see soon new posts on object detection 🙂
    in which bundle of your books do you do the object detection topic and examples like traffic signs?

    • Adrian Rosebrock May 14, 2018 at 2:59 pm #

      Hi Tiri, there will certainly be more posts on object detection. The Practitioner Bundle of Deep Learning for Computer Vision with Python discusses the traditional sliding window + image pyramid method for object detection, including how to use a CNN trained for classification as an object detector. The ImageNet Bundle includes all examples on training Faster R-CNNs and SSDs for traffic sign and front/rear view vehicle detection.

  6. camp May 14, 2018 at 9:13 pm #

    nice. thank you

  7. Nikhil May 14, 2018 at 11:03 pm #

    Hi Adrian, Why am I getting this error?
    $ python3 filter_object_detection.py –prototxt MobileNetSSD_deploy.prototxt.txt –model MobileNetSSD_deploy.caffemodel

    AttributeError: module ‘cv2’ has no attribute ‘dnn’

    • Adrian Rosebrock May 15, 2018 at 6:01 am #

      Make sure you have at least OpenCV 3.3 installed (see the blog post for more details as I discuss why and how you can install OpenCV 3.3+).

  8. Ye Hu May 15, 2018 at 2:10 am #

    So do I. The mAP involves the precision-recall curve.

    • Adrian Rosebrock May 15, 2018 at 6:08 am #

      In the context of object detection the precision would the proportion of our true positives (TP) for each image. The recall would be the proportion of the TP out of all the possible positives for each image. The average precision is then the average of maximum precision values at varying recall steps. I didn’t include the step value for the precision/recall calculation as this is meant to be an introductory blog post to object detection. It’s also not an exhaustive example of how to compute mAP for object detection either (although that could make for a good tutorial).

      If anyone is finds the mAP explanation too simplified (or even too complicated) let me know and I will consider rewriting it.

      UPDATE: I decided to go back and update the blog post to describe the full calculation. Trying to explain the entire mAP calculation is too much for this already lengthy blog post. I’ll cover a detailed computation of mAP in a future tutorial.

  9. Chandramouleeswar May 16, 2018 at 7:48 am #

    Hello Adrian,

    Can you give me a suggestion for image recognition in videos? I am looking forward to implementing Mask-R CNN using Resnet as a base network for recognising persons, vehicles, traffic signals on roads from a video Dataset. What is the better Dataset for my choice?

    • Adrian Rosebrock May 16, 2018 at 5:05 pm #

      Just to clarify, are you looking to perform segmentation on each frame in the dataset which is essentially treating it like working with a set of images? Or are you trying to do activity recognition within the dataset as well where sequences of frames are important?

  10. Elain May 17, 2018 at 2:24 am #

    Can i get a link to the wallpaper?

    • Adrian Rosebrock May 17, 2018 at 6:43 am #

      Which wallpaper are you referring to?

  11. Gilad May 17, 2018 at 8:15 am #

    I would like to understand how we can get 7fps.
    When I trained a CNN for face detection and used Haar-cascade to detect the face itself, on the same computer I got ~7fps.
    If I understand correctly, under the hood, the algorithm is running thousands inference on each box and calculate what it found. How can we reach 7fps?
    Thx for very very interesting post.

    • Adrian Rosebrock May 17, 2018 at 8:51 am #

      The deep learning face detector in this post will already get you over 7 FPS on the CPU. Haar cascades will run many times faster (but likely less accurate depending on your project). Are you using your own CNN trained for face detection? If so consider pushing the computation to the GPU for faster inference.

      • Gilad May 17, 2018 at 3:16 pm #

        I would like to understand what is under the hood of the network in your post. Is it indeed doing inference thousands of times for each picture as your post suggest?

        • Adrian Rosebrock May 22, 2018 at 6:48 am #

          Be careful with the term “inference” here. Typically we use the term inference to refer to a prediction from the model as it’s inferring from the data. In the context of neural networks, an inference is a single forward pass which returns the prediction.

          Perhaps you mean to say the network is performing thousands of computations for each input image? If so, that statement is correct.

  12. Siladittya Manna May 17, 2018 at 12:38 pm #

    This post cleared a lot of confusion I had regarding implementation of object detection and image classification. Thanks a lot!!

    • Adrian Rosebrock May 21, 2018 at 10:39 am #

      Thanks Siladittya, I’m happy to hear you found it helpful 🙂

  13. Gilad May 18, 2018 at 4:25 am #

    Thx Adrian again


    • Adrian Rosebrock May 22, 2018 at 6:49 am #

      Thanks so much for sharing your demo Gilad, great job! 🙂

  14. Zubair Ahmed May 27, 2018 at 11:43 am #

    Nice blog post and off course I learned this and more from your book. To all the readers, if you like this post make sure you get Adrian’s book

    • Adrian Rosebrock May 27, 2018 at 11:57 am #

      Thanks Zubair! 😀

      • Zubair Ahmed May 27, 2018 at 2:40 pm #

        Well to top it off another tutorial to do Object Counting would be an awesome addition to this series 🙂

  15. Suresh Kumar June 19, 2018 at 7:03 am #

    Suresh Kumar:


    You have ignored, Human from this object detection.

    How do I include, Human?


    I would like to add one object like watch or mobile to be detected, How do I add to the Caffe Model File?

    • Adrian Rosebrock June 19, 2018 at 8:22 am #

      1. You could set the IGNORE set to be empty or you could modify the code to use a KEEP class that includes only the specified set of classes.

      2. Please read the blog post as I discuss the answer to your question. You’ll want to apply fine-tuning/transfer learning.

      • Ken Rubio April 3, 2019 at 10:21 pm #

        Hi, Adrian.

        I’m just a bit confused on the difference of fine-tuning, training OD framework from scratch, and transfer learning.

        What I understand is that you will freeze the weights when retraining the model. But is it retraining the model with old classes but additional images, or is it retraining the model with new classes? How does this differ from training OD framework from scratch and transfer learning. This will be a great help for me. Thank you!

        • Adrian Rosebrock April 4, 2019 at 1:12 pm #

          It’s retraining the model using the new classes only but the frozen weights were learned from the original set of classes. If you need help regarding training from scratch, transfer learning, and fine-tuning be sure to read Deep Learning for Computer Vision with Python where I cover the topic in detail.

  16. Dave A June 19, 2018 at 8:36 pm #

    Excellent post again. I’m really enjoying these. In a matter of weeks I’ve modified your code to communicate to some Node-Red flows I have sending me snapshots of motion, faces or certain classes of objects when detected on a Raspberry Pi 3b. (And not be ‘that guy’, but you may want to look over your figure numbering and the references within the text.)
    You make it almost too easy. Thank you!

    • Adrian Rosebrock June 21, 2018 at 5:50 am #

      Congrats on the progress Dave, that’s fantastic!

  17. Suresh Kumar June 20, 2018 at 12:42 am #

    Yes I have added the person, by excluding the lines of IGNORE.. Thank Sir..


    I need a log file to created after stopping the program, How may object are detected and what is the percentage of prediction of each object ..

    How can I do that Sir ?

    • Adrian Rosebrock June 21, 2018 at 5:46 am #

      You should read up on basic file I/O operations using the Python programming language. I’m happy to help but please take the time to do your proper research and read online. There are many Python tutorials available that teach you the fundamentals of the language.

  18. Carlos July 19, 2018 at 10:54 am #

    Hello Adrian,

    Do you think SSD is better than YOLO for object dertection? I noticed you implement SSD on Image Bundle, and not YOLO. Why is that?

    Another question, for detecting targets like airplanes and military targets from satellite images, which one would recommend?

    Loving your 2nd book from dl4cv. When finish this, surely will buy the 3rd!


    • Adrian Rosebrock July 20, 2018 at 6:35 am #

      While YOLO is fast it’s not as accurate as SSDs or Faster R-CNNs. A general rule of thumb is that if you want pure speed and can sacrifice accuracy, use YOLO. If you need to detect tiny objects use Faster R-CNN. If you need a balance, use SSD.

      As far as your second question goes, I assume those objects would appear to be pretty tiny. In that case, Faster R-CNN.

      • Carlos July 20, 2018 at 7:56 pm #

        Thanks for the answer!

        I will try to study more about them, as I want to work in this area in the future.

        Have a nice weekend!

      • sophia November 1, 2018 at 12:24 pm #

        quick question to clear up some confusion about comparison between YOLO and SSD. let’s say we need real-time inference, so we rule out any RCNN variant.

        The YOLO-v3 paper points out that YOLO-v3’s accuracy is comparable to that of SSD. they also highlight that you can reduce the fps to improve accuracy such that it is faster than SSD and more accurate. Is there more to this aspect of comparing the accuracy-speed trade-off ?

        Regarding accuracy on detecting small objects. can you give me some indication of what object would be considered small in an image, for which SSD might be more accurate?

        really appreciate all of the work you’ve put in on this blog. looking forward to your reply,

        • Adrian Rosebrock November 2, 2018 at 7:19 am #

          Keep in mind that there is a very real difference between the claims of a publication and what is actually obtained when used my practitioners and engineers. I have no doubt that the YOLO results in the publication are correct and that for their tests it matched SSD. However, the vast majority of times I’ve used YOLO for my own projects and trained it from scratch the results are not as good as SSD. It very rarely warrants the FPS increase.

          Speaking of FPS increase, the YOLO model running inside OpenCV is actually slower than running a SSD. I’ll be doing a blog post on that soon as well.

          All that said — there is no one true “best” object detector. You need to try them on your own projects and let the empirical results guide you.

          • sophia November 2, 2018 at 3:27 pm #

            thanks so much, Adrian. I look forward to more posts from you!

            One last related general question on this topic:

            would it be possible to train an SSD model to distinguish between a person’s different poses in an image? right now, SSD detects a bounding box around a person. what if we trained an SSD model on images of a person sitting and person standing? could we then get SSD to distinguish between a person sitting and a person standing?

            i’m looking to combine object detection and human pose estimation in one model!

            Any guidance on doing this will be greatly appreciated! Thanks again.

          • Adrian Rosebrock November 6, 2018 at 1:33 pm #

            You can technically do that, yes. Each pose would have its own label. However, you might get better performance out of a model dedicated to “human activity recognition”.

  19. Carlos July 23, 2018 at 10:06 am #

    Dear Adrian,

    On IMAGENET BUNDLE (Faster R-CNNs and Single Shot Detectors (SSDs)) you show how to train these architectures for object detection from my own dataset?

    I am trying to identify cars, people and airplanes from aerial images (satellite, drones, UAV).

    I finished the Convolutional Neural Networks course from Coursera (Andrew Ng) and we implemented YOLO using YAD2K package, but I have no idea (yet) about how to train deep learning architectures for detect my own targets.

    In which book (and chapter) I will find these answers?

    Thanks for the attention.

    • Adrian Rosebrock July 25, 2018 at 8:12 am #

      Hey Carlos — you are correct, the ImageNet Bundle of Deep Learning for Computer Vision with Python will show you how to train Faster R-CNNs and SSDs on your own custom datasets. You will find all chapters on how to perform object detection in the ImageNet Bundle of the book.

  20. Lluis August 6, 2018 at 3:42 pm #

    Hi Adriam,

    thanks for your detailed tutorials, they are a big help to start with deep-learning. What I want to accomplish is to train a network to detect objects (not only classify). The images are in FITS format, used in astronomy images. I was able to train a model in order to classify the object (I followed one of your tutoria Santa/not Santa), but with object detection is not so easy. All the examples or tutorials start with a pretrained newtwork, but I need to start from scratch. Do you have any advice or source that I could follow to accomplish my goal?

    Thanks in advance!

    • Adrian Rosebrock August 7, 2018 at 6:37 am #

      Hi Lluis — I have a number of chapters inside Deep Learning for Computer Vision with Python that demonstrate how to train an object detector model from scratch. That would be my recommended starting point for you to achieve your goal.

      • Lluis August 7, 2018 at 7:16 am #

        Hi Adrian,

        thanks, I will take a look, and let you know with the result.

        Thanks and regards.

  21. Márcio August 12, 2018 at 4:43 pm #

    Hello Adrian, do you have raspberry sdcard .iso with that project?

    • Adrian Rosebrock August 15, 2018 at 8:55 am #

      I do. My Raspbian .img file with OpenCV pre-configured and pre-installed is included in the Quickstart Bundle and Hardcopy Bundle of Practical Python and OpenCV.

  22. Benya Jamiu September 3, 2018 at 6:20 pm #

    Dear Dr.
    Infact i’m yet to buy the book or enroll in any of your course but you have made most of my days and im just lloking for a place to practice it right and i have applied for Msc in AI here in Paris to be specialized in Computer Vision , very soon i will be buy both your books but right now i’m practising all your examples online
    You are great without leaving my room and im moving closer to …..GURU specialist in Computer Vision even with many stress but still practising sleeping 12-02:00 am sometimes

    • Adrian Rosebrock September 5, 2018 at 8:51 am #

      Thank you for the kind words, Benya. I’m so happy to hear you are enjoying the blog and will one day pick up a copy of my books. Keep practicing, you’re doing great! 🙂

  23. Dillon Wells October 10, 2018 at 11:50 am #

    I am a big fan of the prevalence of your Beagle in these blogs. Truly a wholesome meme.

    • Adrian Rosebrock October 12, 2018 at 9:13 am #

      Thanks Dillon 🙂

  24. Charles October 16, 2018 at 4:33 pm #

    Hello Adrian,

    I was wondering where to find already provided functions for evaluating an object detector. Is there any package for evaluation of common metrics in an object detection context with train/validation/tests sets? Or should I write them myself?

    Thanks for your answer.

    • Adrian Rosebrock October 20, 2018 at 8:02 am #

      It actually is fully dependent on the dataset you are using. Some datasets, like COCO or VOC, have very strict sets of training, validation, and testing sets, along with what metrics you are using. Most all datasets use some form of Intersection Over Union and mean Average Precision (mAP).

  25. Carlos October 29, 2018 at 2:45 am #

    Dear Adrian,

    I am learning about Faster RCNN with your book, and now I am practicing in Kaggle challenges.

    I have a doubt about the TensorFlow API and DICOM images. In your book, you explain how to initialize the annotation object used to store information regarding the bounding box, and write all information in TFRecords.

    Will the tfAnnot.encoding works with ‘.dcm’ filetype (DICOM images)?

    The examples in your book are for png and jpeg, and now I am wondering if the script works directly with this medical type of image or need some kind of adjustment.

    Thanks for your support!

    • Adrian Rosebrock October 29, 2018 at 1:14 pm #

      Hey Carlos — I’ve never tried using the TensorFlow Object Detection API with DICOM images so I’m honestly not sure.

  26. Charles October 29, 2018 at 12:24 pm #

    So even for determining if a detection is for instance a true positive? At least the basics?

    • Adrian Rosebrock October 29, 2018 at 1:01 pm #

      Sorry, I’m not sure what you are asking Charles. Could you elaborate?

  27. Charles October 29, 2018 at 2:10 pm #

    Sure. Let’s say I have run a detector which gives me detections as confidence scores and bounding boxes. I would like to have a function given the ground truth, can characterize each detection as either a TP or FP (true or false positive for correct or wrong detections), and give me also false negatives (when the ground truth is not detected).

    I am dealing with an example where I have one class but many objects to detect within an image. It would help me somehow evaluate pre-trained detectors.

    • Adrian Rosebrock October 29, 2018 at 2:18 pm #

      You would need Intersection Over Union along with mean Average Precision (mAP).

      • Charles November 3, 2018 at 3:53 pm #

        Yes I implemented functions to compute the precision, the recall, and the average precision (11-points and all-points interpolation). But say I am using a pre-trained detector and use its predictions on a new set of images, should I use a train-test set approach to find good values of thresholds for the confidence and Intersection over Union for the non-maximum suppression technique?

        Also, can a detector have a good average precision and recall but a very low precision ?

  28. Andreas November 13, 2018 at 5:38 am #

    Hi Adrian,

    Thank you for this tutorial. How do you test this on images instead of the videostream?

    • Adrian Rosebrock November 13, 2018 at 4:19 pm #

      Take a look at this tutorial where I cover deep learning object detection in images.

      • andreas November 14, 2018 at 2:22 am #

        Thank you for your reply. Please keep the tutorials coming they are both inspirational and highly useful.

        • Adrian Rosebrock November 15, 2018 at 12:08 pm #

          Thanks Andreas, I have no plan on stopping writing tutorials.

  29. Chetan Mahajan December 12, 2018 at 8:31 am #

    Hi Sir, I have one question, How to Create a .caffemodel?

    • Adrian Rosebrock December 13, 2018 at 9:05 am #

      The model was trained and created using the Caffe deep learning library.

  30. vipul sonar December 16, 2018 at 8:15 am #

    How to Apply this for offline video??
    can you tell me the changes?

    • Adrian Rosebrock December 18, 2018 at 9:06 am #

      You can use the cv2.VideoCapture function to load a video and loop through the frames. You would then apply the object detector to each individual frame. See this tutorial for an example.

  31. mario February 20, 2019 at 9:25 am #

    Hello. I would like to track/rotoscope/cover the movements of an actress located inside a series of pictures (or inside a video) with the same exact movements of a 3D character that I have created with Blender. Also the camera is moving all around the people inside the footages. Can u tell me if this kind of job is doable with your script. Until now you have talked about object classification and detection,but what’s the pratical use of these ? or better,what’s the next step ? For example,in my specific scenario I could use the deep fake approach to swap the faces of the actors with the faces of the 3D characters,but I see that it needs an high level of computational power that i don’t have on my pc. I can’t use it. I’m here because I hope that I can swap the real human figures (and their movements) with the fake / 3D human figures of characters created in Blender in an easy and fast way. Is that possible with your script ? how ? thanks.

    • Adrian Rosebrock February 20, 2019 at 11:58 am #

      Hey Mario — I don’t have any experience with Blender so I unfortunately cannot provide any guidance there. I would suggest you look into “human pose estimation”. That technique will give you a set of keypoints mapping to various parts of the body which you should then be able to ideally transfer to your model.

  32. Gaurav April 8, 2019 at 1:15 am #

    thanks Adrian.

    I need to detect and and count the cars comes from road how can i do it?
    please guide me.

  33. Sandip April 8, 2019 at 6:20 am #

    Hi Adrian This is the Great Work.

    I need to track on the cars like the faces which are successfully tracked by the code which was given by you..
    so please tell me how can i track on cars can you guide me please?

    • Adrian Rosebrock April 12, 2019 at 12:37 pm #

      Try using this tutorial but swap out the “person” class for the car/truck/bus classes.

  34. YuhwanPark May 2, 2019 at 2:00 am #

    I saw good posting.
    And the implementation shows high detection rate.
    I have a question.
    Can you answer?
    The current configuration is detection for various objects.
    But I want to detect only people.
    What do i do to detect only people?
    If I only detect people, can I get a higher frame rate than before?
    If not, do you have a posting that can detect only people?
    Are people detection in the posting showing a high detection rate?
    Waiting for an answer.

    • Adrian Rosebrock May 8, 2019 at 1:46 pm #

      Trying to detect only a single object class isn’t going to improve the frame rate of the model (as I demonstrated in this post). Perhaps I’m not understanding your question?

  35. david May 10, 2019 at 1:43 pm #

    Thanks for the tutorial. I have a question that I want the program to run in the background of an existing video, what should I do. Please help me.

    • Adrian Rosebrock May 15, 2019 at 3:10 pm #

      I’m not sure what you mean by running in the background of an existing video — could you clarify?

  36. AP June 5, 2019 at 3:50 am #

    Hi Adrian, I am using MobileNet-SSD Model for detecting vehicles. Although, the model is able to detect object quickly it needs to be more accurate to be feasible for traffic detection. It doesn’t perform that well with small images of cars in different frames. Can you suggest some ways to increase accuracy and enhance the performance of the model?

    • Adrian Rosebrock June 6, 2019 at 6:45 am #

      One of the best methods is to take the model and fine-tune it on your own example images, thereby increasing accuracy. I cover how to fine-tune object detectors, including how to improve accuracy, inside Deep Learning for Computer Vision with Python.

      • AP June 15, 2019 at 2:50 am #

        It would be time-consuming and would take a whole lot of effort for compiling a dataset of cars and I can’t see how it would significantly increase the accuracy of the model as it has been trained on car dataset already. Could you suggest some other pre-trained models for the same which might have better accuracy?

        • Adrian Rosebrock June 19, 2019 at 2:16 pm #

          No ML, DL, or CV algorithm is perfect. Just because a model is trained on a dataset with car examples doesn’t mean it will magically be able to detect all other examples of cars. It could be the images the model was trained on were high quality and perhaps yours are lower quality images.

          The point is this:

          A model is only as good as its training data, operating under the assumption that the data it was trained on mimics where it is to be deployed.

          If that assumption doesn’t hold then you cannot expect good performance from your model.

          • AP June 20, 2019 at 1:31 am #

            Thanks, Adrian! Your tutorials have helped a lot throughout. Will try to fine tune my model as to the specific needs.

  37. Jing Lu June 9, 2019 at 11:41 pm #

    Could you explain how NMS works?

  38. Eric T June 16, 2019 at 3:47 pm #

    Minor typo in paragraph:
    In Figure 4 (left) I have included a visual example of a ground-truth bounding box…

    This refers to Figure 5.

    • Adrian Rosebrock June 19, 2019 at 2:07 pm #

      Thanks Eric!

  39. Umesh S July 1, 2019 at 5:12 am #

    Hi Adrian, Thanks for great tutorial. I am trying to build object detection model on custom dataset. Can you please let me know which tool you used for label annotation of images?

  40. Bart July 12, 2019 at 6:36 pm #

    Great stuff Adrian! Thanks for these very useful guides.

    I’m not a python expert, nor do I intend to become one, but it’s cool to use what you put together and try to add my own logic to it. I have some minor PHP experience so I’m not a total stranger to scripting and I’m learning more about Python while playing with it.

    I’m trying to make something that detects birds, zoom my camera towards them and make a picture. I’m focussing on working with persons and cars first because I live on a very busy street with almost every few seconds something comes by, mostly people and cars, so it’s easier and more fun to play with it for now. But maybe later I can even use these images to further train a bird model.

    So far so good. I’ve tried your guides on detecting object in images. Really cool that it works perfectly on my own images too, but of course this is the point. Then on to videos and then went on to interpreting my webcam stream.

    Now playing around printing variables like detections, idx, label, box and seeing what they are and how they work. My next challenge is to being able to make arrays per frame of only unique persons and their centre coordinates. Then I’ll have enough overview in the code (and I guess my head) to play with them and calculate distance between them for instance.

    • Adrian Rosebrock July 25, 2019 at 10:08 am #

      Thanks for the comment Bart, I’m glad you’re enjoying the guides!

      All the best to you.

  41. Simo July 17, 2019 at 1:09 pm #

    Hi Adrian, Thank you for the great post.
    Wanted to know how the approach with faster R-CNN deals with small objects and also clustered objects ? since you mentioned those as downside in the YOLO blog post.

    Thank you

    • Adrian Rosebrock July 25, 2019 at 9:49 am #

      Faster R-CNN will indeed work better with smaller objects, including clustered objects. You should refer to Deep Learning for Computer Vision with Python if you want to learn how to train your own Faster R-CNNs.

  42. Charlotte July 31, 2019 at 6:10 am #

    Thanks for great tutorial Adrian, but i have some problems. When i use this code it works but i want one box around the object like your video but i have many many boxes around the object. I dontunderstand this problem How can i fixed this? Maybe you can help. Thanks for your help.

  43. rajmeet August 2, 2019 at 3:00 pm #

    Hello Adrian
    i want to detect some object in a frame by camera mounted on the robot and then robot moves to wards that object. Is this possible if yes then kindly help me. i am using raspberry pi3 + raspb pi camera.

  44. Andre August 15, 2019 at 1:37 pm #

    Thanks this was a great tutorial… I’m studying CS in college and your blogs and books are better than everything else I’ve seen

    • Adrian Rosebrock August 16, 2019 at 5:28 am #

      Thanks Andre, I really appreciate that 🙂 Good luck with your studies!

  45. Hakim October 27, 2019 at 2:03 am #

    Hi, Adrian.

    First, thank you for this tutorial and the many other tutorials that you have made. It has really helped me with my studies and creating content that I could place within my portfolio. Plus, it was a lot of fun to make these things using the raspberry pi camera.

    I followed what you did for preventing certain kinds of objects from being detected, and it worked once. However, future runs of the program led to the terminal freezing up, or just the application not reaching the stage where the camera would be activated. I know that it is something to do with the statement that checks the list. I’m not sure about how to get around this, or if there really is a way around this, without having to make a whole new training set.

    Also, I was wondering about your personal take on having multiple kinds of detections happening at the same time. Currently, i’m working with a team that is trying to build a robot. We have the edge detection down, but we also need object recognition as well. I tried to get both running on the same camera at the same time, but the result was rather bad ( slow, buggy, inconsistent). How would you handle multiple processes like that on a camera and on the raspberry pi?

  46. Likui November 14, 2019 at 9:17 pm #

    You have good tutorial for deep learning.
    Are you planning to do any tutorial on 3D object detection using Lidar data?
    If you do, it would be very greatly helpful.

    • Adrian Rosebrock November 21, 2019 at 9:25 am #

      I don’t have any plans on that topic but I’ll consider it for the future.

  47. Jasmeet December 5, 2019 at 10:34 pm #

    Hi Adrian, thank you for sharing knowledge with all of us without a price :).
    I would like to ask about the above gif (Figure 8: Real-time deep learning object detection for front and rear views of vehicles.)

    This is exactly what i’m looking for, is this done on this same code by ignoring other classes or this is completely different code for real time car tracking, mind sharing ?

  48. Arvind Chandel December 31, 2019 at 4:29 am #

    Hi Adrian, I have a query related to Tensorflow object detection API below:

    Suppose i train tensorflow faster Rcnn_inception on any custom data having 10 classes like ball, bottle, Coca etc.. and its performing quite well. Now later i got some new data of 10 more classes like Paperboat, Thums up etc and want my model to trained on these too. Is there any method so that i can retrain my generated model for these 10 new classes too to upgrade itself for 20 classes, rather starting training from scratch.

  49. Saurabh January 15, 2020 at 2:32 am #

    Hi Adrian,

    Thanks for sharing the interesting blog!

    I have trained object detection using ssd (mobilenet-v1) on custom dataset. The dataset consist of uno playing card images (skip, reverse, and draw four). On all these cards, model performs pretty well as I have trained model only on these 3 card (around 278 images with 829 bounding boxes collected using mobile phone).

    However, I haven’t trained model on any other card but still it detects other cards (inference using webcam).

    How can I fix this? Should I also collect other class images (anything other than skip, reverse and draw four cards) and ignore this class in operation? So that model sees this class images during training and doesn’t put any label during inference.

    Please share your views and feel free to correct me!

    Thanking you!

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply