YOLO object detection with OpenCV

Click here to download the source code to this post.

In this tutorial, you’ll learn how to use the YOLO object detector to detect objects in both images and video streams using Deep Learning, OpenCV, and Python.

By applying object detection, you’ll not only be able to determine what is in an image, but also where a given object resides!

We’ll start with a brief discussion of the YOLO object detector, including how the object detector works.

From there we’ll use OpenCV, Python, and deep learning to:

  1. Apply the YOLO object detector to images
  2. Apply YOLO to video streams

We’ll wrap up the tutorial by discussing some of the limitations and drawbacks of the YOLO object detector, including some of my personal tips and suggestions.

To learn how use YOLO for object detection with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

YOLO Object detection with OpenCV

In the rest of this tutorial we’ll:

  • Discuss the YOLO object detector model and architecture
  • Utilize YOLO to detect objects in images
  • Apply YOLO to detect objects in video streams
  • Discuss some of the limitations and drawbacks of the YOLO object detector

Let’s dive in!

What is the YOLO object detector?

Figure 1: A simplified illustration of the YOLO object detector pipeline (source). We’ll use YOLO with OpenCV in this blog post.

When it comes to deep learning-based object detection, there are three primary object detectors you’ll encounter:

  • R-CNN and their variants, including the original R-CNN, Fast R- CNN, and Faster R-CNN
  • Single Shot Detector (SSDs)
  • YOLO

R-CNNs are one of the first deep learning-based object detectors and are an example of a two-stage detector.

  1. In the first R-CNN publication, Rich feature hierarchies for accurate object detection and semantic segmentation, (2013) Girshick et al. proposed an object detector that required an algorithm such as Selective Search (or equivalent) to propose candidate bounding boxes that could contain objects.
  2. These regions were then passed into a CNN for classification, ultimately leading to one of the first deep learning-based object detectors.

The problem with the standard R-CNN method was that it was painfully slow and not a complete end-to-end object detector.

Girshick et al. published a second paper in 2015, entitled Fast R- CNN. The Fast R-CNN algorithm made considerable improvements to the original R-CNN, namely increasing accuracy and reducing the time it took to perform a forward pass; however, the model still relied on an external region proposal algorithm.

It wasn’t until Girshick et al.’s follow-up 2015 paper, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, that R-CNNs became a true end-to-end deep learning object detector by removing the Selective Search requirement and instead relying on a Region Proposal Network (RPN) that is (1) fully convolutional and (2) can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image). The outputs of the RPNs are then passed into the R-CNN component for final classification and labeling.

While R-CNNs tend to very accurate, the biggest problem with the R-CNN family of networks is their speed — they were incredibly slow, obtaining only 5 FPS on a GPU.

To help increase the speed of deep learning-based object detectors, both Single Shot Detectors (SSDs) and YOLO use a one-stage detector strategy.

These algorithms treat object detection as a regression problem, taking a given input image and simultaneously learning bounding box coordinates and corresponding class label probabilities.

In general, single-stage detectors tend to be less accurate than two-stage detectors but are significantly faster.

YOLO is a great example of a single stage detector.

First introduced in 2015 by Redmon et al., their paper, You Only Look Once: Unified, Real-Time Object Detection, details an object detector capable of super real-time object detection, obtaining 45 FPS on a GPU.

Note: A smaller variant of their model called “Fast YOLO” claims to achieve 155 FPS on a GPU.

YOLO has gone through a number of different iterations, including YOLO9000: Better, Faster, Stronger (i.e., YOLOv2), capable of detecting over 9,000 object detectors.

Redmon and Farhadi are able to achieve such a large number of object detections by performing joint training for both object detection and classification. Using joint training the authors trained YOLO9000 simultaneously on both the ImageNet classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000, that can predict detections for object classes that don’t have labeled detection data.

While interesting and novel, YOLOv2’s performance was a bit underwhelming given the title and abstract of the paper.

On the 156 class version of COCO, YOLO9000 achieved 16% mean Average Precision (mAP), and yes, while YOLO can detect 9,000 separate classes, the accuracy is not quite what we would desire.

Redmon and Farhadi recently published a new YOLO paper, YOLOv3: An Incremental Improvement (2018). YOLOv3 is significantly larger than previous models but is, in my opinion, the best one yet out of the YOLO family of object detectors.

We’ll be using YOLOv3 in this blog post, in particular, YOLO trained on the COCO dataset.

The COCO dataset consists of 80 labels, including, but not limited to:

  • People
  • Bicycles
  • Cars and trucks
  • Airplanes
  • Stop signs and fire hydrants
  • Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
  • Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
  • …and much more!

You can find a full list of what YOLO trained on the COCO dataset can detect using this link.

I’ll wrap up this section by saying that any academic needs to read Redmon’s YOLO papers and tech reports — not only are they novel and insightful they are incredibly entertaining as well.

But seriously, if you do nothing else today read the YOLOv3 tech report.

It’s only 6 pages and one of those pages is just references/citations.

Furthermore, the tech report is honest in a way that academic papers rarely, if ever, are.

Project structure

Let’s take a look at today’s project layout. You can use your OS’s GUI (Finder for OSX, Nautilus for Ubuntu), but you may find it easier and faster to use the tree  command in your terminal:

Our project today consists of 4 directories and two Python scripts.

The directories (in order of importance) are:

  • yolo-coco/ : The YOLOv3 object detector pre-trained (on the COCO dataset) model files. These were trained by the Darknet team.
  • images/ : This folder contains four static images which we’ll perform object detection on for testing and evaluation purposes.
  • videos/ : After performing object detection with YOLO on images, we’ll process videos in real time. This directory contains four sample videos for you to test with.
  • output/ : Output videos that have been processed by YOLO and annotated with bounding boxes and class names can go in this folder.

We’re reviewing two Python scripts — yolo.py  and yolo_video.py . The first script is for images and then we’ll take what we learn and apply it to video in the second script.

Are you ready?

YOLO object detection in images

Let’s get started applying the YOLO object detector to images!

Open up the yolo.py  file in your project and insert the following code:

All you need installed for this script OpenCV 3.4.2+ with Python bindings. You can find my OpenCV installation tutorials here, just keep in mind that OpenCV 4 is in beta right now — you may run into issues installing or running certain scripts since it’s not an official release. For the time being I recommend going for OpenCV 3.4.2+. You can actually be up and running in less than 5 minutes with pip as well.

First, we import our required packages — as long as OpenCV and NumPy are installed, your interpreter will breeze past these lines.

Now let’s parse four command line arguments. Command line arguments are processed at runtime and allow us to change the inputs to our script from the terminal. If you aren’t familiar with them, I encourage you to read more in my previous tutorial. Our command line arguments include:

  • --image : The path to the input image. We’ll detect objects in this image using YOLO.
  • --yolo : The base path to the YOLO directory. Our script will then load the required YOLO files in order to perform object detection on the image.
  • --confidence : Minimum probability to filter weak detections. I’ve given this a default value of 50% ( 0.5 ), but you should feel free to experiment with this value.
  • --threshold : This is our non-maxima suppression threshold with a default value of 0.3 . You can read more about non-maxima suppression here.

After parsing, the args  variable is now a dictionary containing the key-value pairs for the command line arguments. You’ll see args  a number of times in the rest of this script.

Let’s load our class labels and set random colors for each:

Here we load all of our class LABELS  (notice the first command line argument, args["yolo"]  being used) on Lines 21 and 22. Random COLORS  are then assigned to each label on Lines 25-27.

Let’s derive the paths to the YOLO weights and configuration files followed by loading YOLO from disk:

To load YOLO from disk on Line 35, we’ll take advantage of OpenCV’s DNN function called cv2.dnn.readNetFromDarknet . This function requires both a configPath  and weightsPath  which are established via command line arguments on Lines 30 and 31.

I cannot stress this enough: you’ll need at least OpenCV 3.4.2 to run this code as it has the updated dnn  module required to load YOLO.

Let’s load the image and send it through the network:

In this block we:

  • Load the input image  and extract its dimensions (Lines 38 and 39).
  • Determine the output layer names from the YOLO model (Lines 42 and 43).
  • Construct a blob  from the image (Lines 48 and 49). Are you confused about what a blob is or what the cv2.dnn.blobFromImage  does? Give this blog post a read.

Now that our blob is prepared, we’ll

  • Perform a forward pass through our YOLO network (Lines 50 and 52)
  • Show the inference time for YOLO (Line 56)

What good is object detection unless we visualize our results? Let’s take steps now to filter and visualize our results.

But first, let’s initialize some lists we’ll need in the process of doing so:

These lists include:

  • boxes : Our bounding boxes around the object.
  • confidences : The confidence value that YOLO assigns to an object. Lower confidence values indicate that the object might not be what the network thinks it is. Remember from our command line arguments above that we’ll filter out objects that don’t meet the 0.5  threshold.
  • classIDs : The detected object’s class label.

Let’s begin populating these lists with data from our YOLO layerOutputs :

There’s a lot here in this code block — let’s break it down.

In this block, we:

  • Loop over each of the layerOutputs  (beginning on Line 65).
  • Loop over each detection  in output  (a nested loop beginning on Line 67).
  • Extract the classID  and confidence  (Lines 70-72).
  • Use the confidence to filter out weak detections (Line 76).

Now that we’ve filtered out unwanted detections, we’re going to:

  • Scale bounding box coordinates so we can display them properly on our original image (Line 81).
  • Extract coordinates and dimensions of the bounding box (Line 82). YOLO returns bounding box coordinates in the form: (centerX, centerY, width, and height) .
  • Use this information to derive the top-left (x, y)-coordinates of the bounding box (Lines 86 and 87).
  • Update the boxes , confidences , and classIDs  lists (Lines 91-93).

With this data, we’re now going to apply what is called “non-maxima suppression”:

YOLO does not apply non-maxima suppression for us, so we need to explicitly apply it.

Applying non-maxima suppression suppresses significantly overlapping bounding boxes, keeping only the most confident ones.

NMS also ensures that we do not have any redundant or extraneous bounding boxes.

Taking advantage of OpenCV’s built-in DNN module implementation of NMS, we perform non-maxima suppression on Lines 97 and 98. All that is required is that we submit our bounding boxes , confidences , as well as both our confidence threshold and NMS threshold.

If you’ve been reading this blog, you might be wondering why we didn’t use my imutils implementation of NMS. The primary reason is that the NMSBoxes  function is now working in OpenCV. Previously it failed for some inputs and resulted in an error message. Now that the NMSBoxes  function is working, we can use it in our own scripts.

Let’s draw the boxes and class text on the image!

Assuming at least one detection exists (Line 101), we proceed to loop over idxs  determined by non-maxima suppression.

Then, we simply draw the bounding box and text on image  using our random class colors (Lines 105-113).

Finally, we display our resulting image until the user presses any key on their keyboard (ensuring the window opened by OpenCV is selected and focused).


To follow along with this guide, make sure you use the “Downloads” section of this tutorial to download the source code, YOLO model, and example images.

From there, open up a terminal and execute the following command:

Figure 2: YOLO with OpenCV is used to detect people and baggage in an airport.

Here you can see that YOLO has not only detected each person in the input image, but also the suitcases as well!

Furthermore, if you take a look at the right corner of the image you’ll see that YOLO has also detected the handbag on the lady’s shoulder.

Let’s try another example:

Figure 3: YOLO object detection with OpenCV is used to detect a person, dog, TV, and chair. The remote is a false-positive detection but looking at the ROI you could imagine that the area does share resemblances to a remote.

The image above contains a person (myself) and a dog (Jemma, the family beagle).

YOLO also detects the TV monitor and a chair as well. I’m particularly impressed that YOLO was able to detect the chair given that it’s handmade, old fashioned “baby high chair”.

Interestingly, YOLO thinks there is a “remote” in my hand. It’s actually not a remote — it’s the reflection of glass on a VHS tape; however, if you stare at the region it actually does look like it could be a remote.

The following example image demonstrates a limitation and weakness of the YOLO object detector:

Figure 4: YOLO and OpenCV are used for object detection of a dining room table.

While both the wine bottle, dining table, and vase are correctly detected by YOLO, only one of the two wine glasses is properly detected.

We discuss why YOLO struggles with objects close together in the “Limitations and drawbacks of the YOLO object detector” section below.

Let’s try one final image:

Figure 5: Soccer players and a soccer ball are detected with OpenCV using the YOLO object detector.

YOLO is able to correctly detect each of the players on the pitch, including the soccer ball itself. Notice the person in the background who is detected despite the area being highly blurred and partially obscured.

YOLO object detection in video streams

Now that we’ve learned how to apply the YOLO object detector to single images, let’s also utilize YOLO to perform object detection in input video files as well.

Open up the yolo_video.py  file and insert the following code:

We begin with our imports and command line arguments.

Notice that this script doesn’t have the --image  argument as before. To take its place, we now have two video-related arguments:

  • --input : The path to the input video file.
  • --output : Our path to the output video file.

Given these arguments, you can now use videos that you record of scenes with your smartphone or videos you find online. You can then process the video file producing an annotated output video. Of course if you want to use your webcam to process a live video stream, that is possible too. Just find examples on PyImageSearch where the  VideoStream  class from imutils.video  is utilized and make some minor changes.

Moving on, the next block is identical to the block from the YOLO image processing script:

Here we load labels and generate colors followed by loading our YOLO model and determining output layer names.

Next, we’ll take care of some video-specific tasks:

In this block, we:

  • Open a file pointer to the video file for reading frames in the upcoming loop (Line 45).
  • Initialize our video writer  and frame dimensions (Lines 46 and 47).
  • Try to determine the total  number of frames in the video file so we can estimate how long processing the entire video will take (Lines 50-61).

Now we’re ready to start processing frames one by one:

We define a while  loop (Line 64) and then we grab our first frame (Line 66).

We make a check to see if it is the last frame of the video. If so we need to break  from the while  loop (Lines 70 and 71).

Next, we grab the frame dimensions if they haven’t been grabbed yet (Lines 74 and 75).

Next, let’s perform a forward pass of YOLO, using our current frame  as the input:

Here we construct a blob  and pass it through the network, obtaining predictions. I’ve surrounded the forward pass operation with time stamps so we can calculate the elapsed time to make predictions on one frame — this will help us estimate the time needed to process the entire video.

We’ll then go ahead and initialize the same three lists we used in our previous script: boxes , confidences , and classIDs .

This next block is, again, identical to our previous script:

In this code block, we:

  • Loop over output layers and detections (Lines 94-96).
  • Extract the classID  and filter out weak predictions (Lines 99-105).
  • Compute bounding box coordinates (Lines 111-117).
  • Update our respective lists (Lines 121-123).

Next, we’ll apply non-maxima suppression and begin to proceed to annotate the frame:

You should recognize these lines as well. Here we:

  • Apply NMS using the cv2.dnn.NMSBoxes  function (Lines 127 and 128) to suppress weak, overlapping bounding boxes. You can read more about non-maxima suppression here.
  • Loop over the idxs  calculated by NMS and draw the corresponding bounding boxes + labels (Lines 131-144).

Let’s finish out the script:

To wrap up, we simply:

  • Initialize our video writer  if necessary (Lines 147-151). The writer  will be initialized on the first iteration of the loop.
  • Print out our estimates of how long it will take to process the video (Lines 154-158).
  • Write the frame  to the output video file (Line 161).
  • Cleanup and release pointers (Lines 165 and 166).

To apply YOLO object detection to video streams, make sure you use the “Downloads” section of this blog post to download the source, YOLO object detector, and example videos.

From there, open up a terminal and execute the following command:

Figure 6: YOLO deep learning object detection applied to a car crash video.

Above you can see a GIF excerpt from a car chase video I found on YouTube.

In the video/GIF, you can see not only the vehicles being detected, but people, as well as the traffic lights, are detected too!

The YOLO object detector is performing quite well here. Let’s try a different video clip from the same car chase video:

Figure 7: In this video of a suspect on the run, we have used OpenCV and YOLO object detection to find the person.

The suspect has now fled the car and is running across a parking lot.

YOLO is once again able to detect people.

At one point the suspect is actually able to make it back to their car and continue the chase — let’s see how YOLO performs there as well:

Figure 8: YOLO is a fast deep learning object detector capable of being used in real time video provided a GPU is utilized.

Note: This video was simply too large for me to include in the “Downloads”. You may download the video from YouTube here.

As a final example, let’s see how we may use YOLO as a starting point to building a traffic counter:

Figure 9: A video of traffic going under an overpass demonstrates that YOLO and OpenCV can be used to detect cars accurately and quickly.

I’ve put together a full video of YOLO object detection examples below:

Credits for video and audio:

Limitations and drawbacks of the YOLO object detector

Arguably the largest limitation and drawback of the YOLO object detector is that:

  1. It does not always handle small objects well
  2. It especially does not handle objects grouped close together

The reason for this limitation is due to the YOLO algorithm itself:

  • The YOLO object detector divides an input image into an SxS grid where each cell in the grid predicts only a single object.
  • If there exist multiple, small objects in a single cell then YOLO will be unable to detect them, ultimately leading to missed object detections.

Therefore, if you know your dataset consists of many small objects grouped close together then you should not use the YOLO object detector.

In terms of small objects, Faster R-CNN tends to work the best; however, it’s also the slowest.

SSDs can also be used here; however, SSDs can also struggle with smaller objects (but not as much as YOLO).

SSDs often give a nice tradeoff in terms of speed and accuracy as well.

It’s also worth noting that YOLO ran slower than SSDs in this tutorial. In my previous tutorial on OpenCV object detection we utilized an SSD — a single forward pass of the SSD took ~0.03 seconds.

However, from this tutorial, we know that a forward pass of the YOLO object detector took ~0.3 seconds, approximately an order of magnitude slower!

If you’re using the pre-trained deep learning object detectors OpenCV supplies you may want to consider using SSDs over YOLO. From my personal experience, I’ve rarely encountered situations where I needed to use YOLO over SSDs:

  • I have found SSDs much easier to train and their performance in terms of accuracy almost always outperforms YOLO (at least for the datasets I’ve worked with).
  • YOLO may have excellent results on the COCO dataset; however, I have not found that same level of accuracy for my own tasks.

I, therefore, tend to use the following guidelines when picking an object detector for a given problem:

  1. If I know I need to detect small objects and speed is not a concern, I tend to use Faster R-CNN.
  2. If speed is absolutely paramount, I use YOLO.
  3. If I need a middle ground, I tend to go with SSDs.

In most of my situations I end up using SSDs or RetinaNet — both are a great balance between the YOLO/Faster R-CNN.

Want to train your own deep learning object detectors?

Figure 10: In my book, Deep Learning for Computer Vision with Python, I cover multiple object detection algorithms including Faster R-CNN, SSDs, and RetinaNet. Inside I will teach you how to create your object detection image dataset, train the object detector, and make predictions. Not to mention I also cover deep learning fundamentals, best practices, and my personal set of rules of thumb. Grab your copy now so you can start learning new skills.

The YOLO model we used in this tutorial was pre-trained on the COCO dataset…

…but what if you wanted to train a deep learning object detector on your own dataset?

Inside my book, Deep Learning for Computer Vision with Python, I’ll teach you how to train Faster R-CNNs, Single Shot Detectors (SSDs), and RetinaNet to:

  • Detect logos in images
  • Detect traffic signs (ex. stop sign, yield sign, etc.)
  • Detect the front and rear views of vehicles (useful for building a self-driving car application)
  • Detect weapons in images and video streams

All object detection chapters in the book include a detailed explanation of both the algorithm and code, ensuring you will be able to successfully train your own object detectors.

To learn more about my book (and grab your free set of sample chapters and table of contents), just click here.

Summary

In this tutorial we learned how to perform YOLO object detection using Deep Learning, OpenCV, and Python.

We then briefly discussed the YOLO architecture followed by implementing Python code to:

  1. Apply YOLO object detection to single images
  2. Apply the YOLO object detector to video streams

On my machine with a 3GHz Intel Xeon W processor, a single forward pass of YOLO took ~0.3 seconds; however, using a Single Shot Detector (SSD) from a previous tutorial, resulted in only 0.03 second detection, an order of magnitude faster!

For real-time deep learning-based object detection on your CPU with OpenCV and Python, you may want to consider using the SSD.

If you are interested in training your own deep learning object detectors on your own custom datasets, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I provide detailed guides on how to successfully train your own detectors.

I hope you enjoyed today’s YOLO object detection tutorial!

To download the source code to today’s post, and be notified when future PyImageSearch blog posts are published, just enter your email address in the form below.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , ,

134 Responses to YOLO object detection with OpenCV

  1. Naser November 12, 2018 at 11:25 am #

    Great tutarial
    Thanks adrian.
    Can you make a tutarial and explain in that how can train yolo on our custom dataset??
    Thank you.

    • Adrian Rosebrock November 12, 2018 at 12:10 pm #

      I actually cover how to train your own custom object detectors inside Deep Learning for Computer Vision with Python. I would suggest starting there.

      • Gary November 12, 2018 at 6:40 pm #

        Hi Adrian,

        did you show in your book training custom objects with different frameworks like Yolo,YoloV3,Tensorflow,Mxnet and Caffe with faster-RNN vs. SSD?

        If not, that would be great to see which framework has the best object multi detector for small and close objects. Hope you will think about this.

        Thanks a lot for all your great tutorials.

        • Adrian Rosebrock November 13, 2018 at 4:26 pm #

          Inside the book I focus on Faster R-CNN, SSDs, and RetinaNet. Per my suggestions in this blog post I don’t tend to use YOLO that often.

          • yan November 14, 2018 at 12:14 pm #

            When I was using Raspberry 3B +, I encountered the error of Attribute Error:’ NoneType’ Object Has No Attribute’ Shape’, but I don’t know how to fix it. I hope I can get your guidance

          • Adrian Rosebrock November 15, 2018 at 12:00 pm #

            Your path to the input image is invalid and cv2.imread is returning “None”. Double-check the path to your input image. Also read this tutorial on NoneType errors.

      • Yonten November 29, 2018 at 3:31 am #

        I have trained my dataset on darknet and I am using your code to detect my trained images but I cannot see the bounding box. When I run in darknet, I can cleary see the output with the bounding box. Can you tell me which code I should edit?

        • Adrian Rosebrock November 30, 2018 at 9:00 am #

          I would raise that question with the OpenCV developers. Your architecture may be different or some additional model conversion may need to take place.

  2. aiwen November 12, 2018 at 11:42 am #

    so good!

    • Adrian Rosebrock November 12, 2018 at 12:09 pm #

      Thanks Aiwen, I’m glad you liked it!

  3. ShivaGuntuku November 12, 2018 at 11:54 am #

    Hi Adrian, thank you for the tutorial,

    Although I am getting this error in ubuntu18 , python3.6 and cv2 version ‘3.4.0’


    error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile,

    Please help me out.

    • Adrian Rosebrock November 12, 2018 at 12:12 pm #

      You need at least OpenCV 3.4.2 for this tutorial. OpenCV 4 would work as well.

      • ShivaGuntuku November 12, 2018 at 1:43 pm #

        Thanks Adrian, it worked. Nice

        • Adrian Rosebrock November 13, 2018 at 4:35 pm #

          Awesome, I’m glad that worked.

          • Richard Wiseman November 14, 2018 at 5:04 am #

            Might be worth updating the article to say 3.4.2 rather than 3.4 as it currently does. This caught me out too.

          • Adrian Rosebrock November 15, 2018 at 12:06 pm #

            Thanks for catching that Richard. I’ve updated the post 😉

      • John Kang November 13, 2018 at 6:15 pm #

        I got same error on Windows. I have OpenCV-Python 3.4.0 installed. How to install opencv-python 3.4.2 on windows?

        thanks in advance

        • Adrian Rosebrock November 15, 2018 at 12:16 pm #

          I’m sorry to hear about the error message. You would indeed need to install OpenCV 3.4.2 or higher. That said, I do not officially support Windows here on the PyImageSearch blog (I haven’t even used a Windows machine in 11+ years now). When it comes to computer vision and deep learning I highly recommend you use a Unix-based machine such as Ubuntu or macOS. I have a number of OpenCV install tutorials for those operating systems. If you need help with Windows I would need to refer you to the official OpenCV website.

  4. Sourav November 12, 2018 at 12:02 pm #

    can it be implemented on a pi 3B?

    • Adrian Rosebrock November 12, 2018 at 12:10 pm #

      Yes, but it would be extremely slow, under 1 FPS (at least for the OpenCV + YOLO version). The Movidius NCS does have a YOLO model that supposedly works but I have never tried it — that would likely get you to a FPS.

      • wally kulecz November 12, 2018 at 5:34 pm #

        If you are talking about this TinyYolo model for the Movidius from the appzoo:

        https://github.com/movidius/ncappzoo/tree/master/caffe/TinyYolo

        Its not the same model used in this tutorial.

        I’ve played with it and it was really poor at detecting people and really good at finding people in shadows (false positives) so it was useless for my purposes.

        The YOLOv3 model used here has performed admirably on the test images where the TinyYolo model from the NCS appzoo (linked above) failed miserably.

        If there is a Movidius version of this YOLOv3 model, point me to it and I’ll give it a try and report back.

        • Adrian Rosebrock November 13, 2018 at 4:26 pm #

          That was the one I was thinking of, thanks Wally. I’m not aware of a YOLOv3 model for the Movidius though.

          • wally kulecz November 14, 2018 at 4:51 pm #

            Looks like a Movidius NCS2 using the Myriad X is available, the splash pages suggests “up to 8X faster” than the Movidius:
            https://software.intel.com/en-us/neural-compute-stick

            They are available, I just ordered one from Mouser for $99 + tax and shipping.

            No mention of Raspberry Pi support, for now. It looks like the the OpenVINO toolkit will be required to use it. They are supporting Windows 10 for this one. Its a free download:
            https://software.intel.com/en-us/openvino-toolkit/choose-download/free-download-linux

            The Pi is where this improved device could really help, but it looks like it needs USB3 and a specific driver which may explain the lack of Pi support.

            I’m expecting a challenge to get the tool kit install.

          • Adrian Rosebrock November 15, 2018 at 11:56 am #

            Intel sent me a NCS2 but I must admit that I never unboxed it 🙁 I’ve been too busy releasing the 2nd edition of DL4CV. I’ll have to carve out some time and play with it as well 🙂 Thanks for the motivation, Wally.

          • wally kulecz November 19, 2018 at 8:30 pm #

            Perhaps a bit more motivation.

            I installed the openVINO SDK on that old i3 system that I mentioned in another reply (failed with library version errors on Ubuntu 18.04 , so I installed 16.04 to the free space on the drive and dual boot).

            Running their C++ interactive_facedetection_demo sample code with a USB WebCam I get these results:

            NCS facedetection: ~16 fps face analysis: ~3 fps
            NCS2 ~42 fps ~10 fps
            CPU ~17 fps ~5.4 fps
            Note that the CPU needed an FP32 model where the NCS used fp16. As with my MobleNet-SSD Python code, the CPU on the i3 is about the same as the Movidius NCS, but the NCS2 shows very worthwhile improvement.

            The SDK auto-detects NCS vs NCS2 so it was just a matter of unplugging the NCS and plugging int the NCS2 to get these numbers from the live openCV overlay.

            The SDK compiles openCV v4.0.0-pre. It appears to support Python Virtual Environment, although I didn’t use one.

            The GPU support seems not to work on this old i3-i915 motherboard.

            There is a C++ example for YOLOv3 object detection in the installed sample code.

            But my first task will be to see if I can re-write my Python code to use the openVINO Python support as from my limited test it looks like one NCS2 might be able to exceed the fps I get with three NCS sticks.

      • Devin November 19, 2018 at 8:52 pm #

        Hi, Doctor Adrian, very glad to read ur blog. i have a project that should recognize and detect object in the video based on Raspberry Pi 3B+ , my boss wanna i use deep learning method, such as resnet, ssd, yolov3, etc… but, in your blog, i know it’s difficult to achieve real time…what should i do? could u please give me some advice?
        thanks!

        • Adrian Rosebrock November 20, 2018 at 9:14 am #

          Hey Devin — I cover how to train your own custom object detectors (Faster R-CNN, SSDs, RetinaNet, etc.) inside my book, Deep Learning for Computer Vision with Python. I also discuss and demonstrate how to obtain real-time performance and which model is suitable for various tasks. I would suggest you start there.

    • wally kulecz November 12, 2018 at 5:49 pm #

      The downloaded tutorial code runs fine on my Pi3B+ with python3 and openCV 3.4.2, but it takes 14 seconds to process an image. Can’t imagine how this could be of any use beyond a demo.

  5. sset November 12, 2018 at 12:44 pm #

    Thanks for great article.

    How do we custom train for customized dataset?

  6. Alex November 12, 2018 at 1:05 pm #

    Hello Adrian, which GPU did you use to achieve this performance?

    • Adrian Rosebrock November 13, 2018 at 4:37 pm #

      I did not use a GPU, it was CPU only. OpenCV’s “dnn” module does not yet support many GPUs.

  7. Cenk Camkoy November 12, 2018 at 1:41 pm #

    This is really very cool. Thanks for sharing all these together with your valuable benchmarks. By the way, out of my curiosity, do you know what type of object detector is used in Google’s autonomous cars? SSD or other?

    • Adrian Rosebrock November 13, 2018 at 4:36 pm #

      Hm, no, I don’t know what Google is using in their autonomous cars. SSDs are rooted in Google research though so that would likely be my guess.

  8. JBeale November 12, 2018 at 1:54 pm #

    YOLO may not win on real-world metrics, but it is clearly #1 in readability of the associated papers.

    • Adrian Rosebrock November 13, 2018 at 4:34 pm #

      Agreed 🙂

  9. Max November 12, 2018 at 2:31 pm #

    Hi,
    It is possible to make it up and running on a GPU?

    • Adrian Rosebrock November 13, 2018 at 4:34 pm #

      That depends. OpenCV’s “dnn” module currently does not support NVIDIA GPUs. It does work with some Intel GPUs though.

  10. Bob Estes November 12, 2018 at 2:55 pm #

    To be clear, your performance numbers for YOLO and SSD are for a CPU version, not a GPU version, right? Thanks.

    • Adrian Rosebrock November 13, 2018 at 4:32 pm #

      That is correct. YOLO can run 40+ FPS on a GPU. Tiny-YOLO can reportedly get past 100+ FPS.

  11. julio November 12, 2018 at 3:21 pm #

    If you work OpenCV with CUDA support, can you achieve 30FPS in real time? … I mean …
    1. YoloV3 + module dnn + CPU is very slow
    2. YoloV3 + module dnn + GPU that FPS speed could reach for real-time applications?

    How could I use Yolo in real time on a laptop GPU like Asus’ GeForce 930MX?

    • Adrian Rosebrock November 13, 2018 at 4:32 pm #

      See my replies to the other comments in this post — OpenCV does not yet support NVIDIA GPUs for their “dnn” module (hopefully soon though). That said, YOLO by itself can achieve 40+ FPS when ran on a GPU.

  12. kelemu November 12, 2018 at 3:36 pm #

    Hi Adrian, I am waits like this tutorials but now I am lucky to get from you really tanks a lot. How to train YOLO with our datasets?

    • Adrian Rosebrock November 13, 2018 at 4:31 pm #

      I don’t have any tutorials for training YOLO from scratch. Typically I recommend using SSDs or RetinaNet, both of which (and Faster R-CNNs), are covered inside Deep Learning for Computer Vision with Python.

  13. Sam November 12, 2018 at 3:37 pm #

    Thanks Adrian.. great post.

    Can I use it with Movidius NCS with custom dataset?

  14. Robert November 12, 2018 at 3:56 pm #

    Thanks for suggesting to read the Yolo v3 research paper, that’s easily the most entertaining and honest research paper I’ve ever read, all the way to the last line!

    • Adrian Rosebrock November 13, 2018 at 4:30 pm #

      Awesome, I’m glad you enjoyed it Robert!

  15. Hemant November 12, 2018 at 4:26 pm #

    Hey Adrian, nice article and very useful. I tried it on Pi 3 and as you stated, it is very slow. I am getting object detection rate of 1 frame per 16 seconds. Processing of the airport.mp4 took little less than 4 hours. Looking forward to your second edition of the book.

    • Adrian Rosebrock November 13, 2018 at 4:29 pm #

      Thank you for checking YOLO performance on the Pi, Hemant!

  16. wally kulecz November 12, 2018 at 4:59 pm #

    Nice timing on this, I just finished installing Ubuntu-Mate 18.04 on an i3 system. The installation of the Movidius v.1 SDK pulled in openCV 3.4.3 (presumably from PyPi) so I grabbed this sample code and gave it a try.

    The yolo is taking ~1.47 seconds.

    This is not a powerful machine (1.8 GHz if I remember right), but I’m getting about 10 fps with MobilenetSSD (from a previous tutorial) and one NCS stick handling 4 cameras (round-robin sampling) and near linear speed up with multiple sticks — 19.5 fps with 2 sticks 29 fps with 3 sticks. This is heavily threaded Python code with one main thread and one thread for each NCS stick and one thread for each Onvif network camera. A 4th NCS ( 9 threads) may be too much of a good thing as it drops to 24.6 fps. Although I had to have two sticks on a powered hub when I added the 4th stick for lack of ports, this may be a bit of a bottleneck as re-running the 3 stick test with two of them on hub dropped about 2 fps.

    I hope one of the AI gurus can compile this yolo model for the NCS, although I realize this may not be possible.

    Does your Xeon system use GPU (CUDA) acceleration? If so how many cuda cores?

    My i7 Desktop has a GTX-950 with 2GB ram and 768 cuda cores, so I’m wondering if its worth the trouble to try and enable it. I need to update its openCV from 3.3.0 to 3.4.3 before I can run this tutorial, so this could be a good time for me to try and activate cuda.

    • Adrian Rosebrock November 13, 2018 at 4:28 pm #

      I love your multi-Movidius NCS setup, Wally! I would love to learn more about it and how you are using it.

      As for my Xeon system, no, there is no CUDA acceleration. Although my iMac does have a Vega GPU so I suppose I could look into trying out the Intel + OpenCV + dnn drivers.

      In your case don’t bother with it. OpenCV doesn’t yet support NVIDIA GPUs with their “dnn” module (hopefully soon though!)

      • wally kulecz November 13, 2018 at 11:37 pm #

        Thanks for the most useful info about openCV and CUDA, maybe for openCV 4.x.x it’ll be worth revisiting. I really appreciate shared experience that saves me from a dead end!

        My multi-Movidius Python code uses NCSDK API v.1 and has been tested with Python 3.6 and 2.7 on Ubuntu-Mate 18.04, Raspbian Stretch on a Pi3B+ with Python 2.7 and 3.5, and Ubuntu-Mate 16.04 with Python 3.5 virtual environment (I never setup the virtual environment for python 2.7). If no Movidius are found, it drops down to using your Caffe version of Mobilenet-SSD on the CPU with one thread per camera.

        On my i7 with four cameras and three NCS I’m getting ~30 fps (8 threads) and with no NCS I’m getting about the same ~30 fps (9 threads). In each case there is evidence that the AI spends significant time waiting for images

        On an i3 (same four cameras) its getting ~29 fps with three NCS, but it falls apart with no NCS only getting ~8 fps and its clear the camera threads that are waiting for the AI threads. Just not enough cores for the CPU AI.

        On a Pi3B+ with three cameras its getting ~6.7 fps with one NCS (5 threads), ~11 fps with two NCS (6 threads), and ~13 fps with three NCS (7 threads). Two NCS seems to spend significant time waiting on the AI, while three NCS appears to spend significant time waiting on images, based on summary counts in the threads that the camera thread would block on queue.put() and the NCS thread would block on queue.get().

        Right now its only supported input is Onvif netcameras via their “snapshot” URL. The single stick version used your imutils to optionally use USB cameras or the PiCamera module, but I ripped this support out of the multi-stick version as few USB cameras work with IR illumination and only one PiCamera module can be used on a Pi as far as I know.

        I need four cameras minimum, my use is for a video security system where a commercial “security DVR” provides 24/7 video recording while the AI provides near zero false positive rate high priority “push” notifications when it is armed in “not home mode”, audio alerts (via espeak-ng) if armed in “at home mode”, and nothing when in “idle mode”.

        The Python code does the AI, node-red does the controlling and notifications, and MQTT glues it all together. The basic system has been running since early July and it works extremely well. It continues to evolve, mostly to improve the frame rate and reduce the detection latency — think “bad guys” marshaling on your property for a “home invasion”. But we love it for when the mailman comes or a package is delivered 🙂

        I’d be happy to send you the Python code if you are interested, in fact I’d like to see if it works on a Mac. The CPU only part runs on Windows 10 and 7 (no NCS support without way more effort than I’m willing to apply) in limited testing with the single stick (AI thread) version (I’ve removed the Windows support from the multi-stick code). I’ve totally given up on Windows since I retired, but a couple of Windows only friends were interested early on (hence the Win7 and Win10 tests), and I must say that this was by far the best cross-platform development experience I’ve ever had! Python has really impressed me!

        I plan to put it up on GitHub eventually, the Ubuntu 18.04 and PyPi openCV install was so easy I finally think I could write a README.md (in a reasonable amount of time) that someone could actually use from a fresh install of Raspbian or Ubuntu.

        • Adrian Rosebrock November 15, 2018 at 12:11 pm #

          Thanks for the detailed writeup, Wally! Let me know when you publish it on GitHub and I’ll take a look 🙂

      • faurog November 19, 2018 at 11:20 pm #

        Hi, Dr. Adrian. It would be nice if you tried using an Intel iGPU + OpenCV + dnn module. My laptop has a Nvidia GPU (not well supported yet) and an integrated Intel GPU, but I couldn’t make it work (net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL)). Anyway, if you try something, let us know. I would like to know if it indeed improves the performance. Thank you for another incredible post. Cheers.

        • Adrian Rosebrock November 20, 2018 at 9:12 am #

          I unfortunately do not have an Intel GPU right now. I hope to try it in the future though. Perhaps another reader can share their experience.

  17. blank November 12, 2018 at 8:55 pm #

    always cool tutorial, keep it up, have a great day! 🙂

    • Adrian Rosebrock November 13, 2018 at 4:25 pm #

      Thanks, you too 🙂

  18. Shivam Sahil November 12, 2018 at 9:30 pm #

    I always had this question in mind, even though it should be the fastest detector, whenever I use it in real time video detection, it gets slowest even than normal cnn which works pretty fast in my laptop. Is it because I have amd graphics card instead of NVidea or something else? Was just confused… let me know if you have suggestions regarding the same. I saw it takes about 1.3 sec to detect all the individual objects in one frame. But how is it able to detect objects quickly in your predefined videos, I just changed those to make it real time and it again went super slow, please let me know what is the actual issue.

    • Adrian Rosebrock November 13, 2018 at 4:25 pm #

      Keep in mind that the YOLO model is not accessing your GPU here. The YOLO + OpenCV implementation is running on your CPU which is why it’s taking a long time for inference.

      • Jason November 15, 2018 at 11:06 am #

        Adrian, as always, you have a nice tutorial. Thanks a lot.

        You can speed up the YOLO model on CPU by using OpenMP. Open makefile, and set AVX=1 and OPENMP=1.

        • Adrian Rosebrock November 15, 2018 at 11:49 am #

          Thanks Jason. How much of a speed increase are you seeing with that change?

          • Jason November 15, 2018 at 10:04 pm #

            I have not had the chance to download your codes yet. I am currently using my own data to train YOLOv3. It takes a lot time to prepare the images for training because you have to draw a bounding box for each objects in each images. Once I finish the training, I will let you know the speed difference between turning OpenMP on and off in prediction.

            By the way, you can also set OpenCV on and off in YOLO.

        • git-scientist November 28, 2018 at 10:10 pm #

          Hi Jason, could you give some detailed info about OpenMP? How one should make use of it? And, where does that makefile reside?

  19. Balaji November 12, 2018 at 10:23 pm #

    Hi,

    Nice tutorial for Yolo and valid comparsion with other object detection models.
    I want to detect small objects, so more interested in Faster-Rcnn resnet models, In this blog I can see you have mentioned they will outperform with ~5fps. I am using Faster-Rcnn resnet101 model in GPU 1080, but I am getting only 1.5 fps.
    Can you please suggest how to improve the speed.
    And as a user want to ask, When can we except a blog on Faster Rcnn Models and their advantages with custom training.

    Thank You

    • Adrian Rosebrock November 13, 2018 at 4:23 pm #

      Hey Balaji — I actually show you how to train your own custom Faster R-CNN models on your own datasets inside my book, Deep Learning for Computer Vision with Python. I also provide you with my tips, best practices, and suggestions on how to improve your model performance and speed. Be sure to take a look, I think it will really help you out.

  20. Jacob November 12, 2018 at 10:28 pm #

    What performance do you expect when run with a Tesla V100 GPU with 608×608 images? With darknet, I can process images with yolo between 80-90 fps. Yolo is typically much slower when implemented in python–does this opencv implementation also have a significant reduction in performance compared to darknet?

    • Adrian Rosebrock November 13, 2018 at 4:22 pm #

      OpenCV doesn’t yet support NVIDIA GPUs with their “dnn” module so we cannot yet obtain that benchmark. NVIDIA GPU support is coming soon but it’s not quite there yet.

  21. adam_Viz November 13, 2018 at 1:26 am #

    Oh Adrain!!! Awesome,am implemented successfully without any hasle..thankx for your contribution .

    • Adrian Rosebrock November 13, 2018 at 4:21 pm #

      Thanks Adam — and thank you for being a PyImageSearch reader.

  22. Alexander November 13, 2018 at 2:34 am #

    Hello, Adrian!

    What could you think about problem with real-time video from web-cameras? In our project (on-line detecting cars and peoples) when we used OpenCV3 with real-time video, we got big delay between frames… We solved this problem, but now we don`t using real-time video-streams from OpenCV.

    Could you have sample with real-time stream, not mp4 or avi-files?

    Best wishes, Alexander,
    Russia, Novosibirsk.

    • Adrian Rosebrock November 13, 2018 at 4:21 pm #

      Keep in mind that deep learning models will run significantly faster on a GPU. You might want to refactor your code to use pure Keras, TensorFlow, Caffe, or whatever your model was trained with, enabling you to access your GPU. More GPU support with OpenCV is coming soon but it’s not quite there yet.

  23. TAYFUN ARABACI November 13, 2018 at 2:50 am #

    very very nice Adrian :=)

    • Adrian Rosebrock November 13, 2018 at 4:20 pm #

      Thanks Tayfun!

  24. Riad November 13, 2018 at 6:16 am #

    Great tutorial ! But I notice that the code doesn’t work with grayscale images. Is there some parameters I can tweak to make it work?

    • Adrian Rosebrock November 13, 2018 at 4:16 pm #

      YOLO expects three channel RGB input images. If you have an input grayscale image just stack it to create a “faux” RGB/grayscale image:

      image = np.dstack([gray] * 3)

  25. Anusha November 13, 2018 at 8:55 am #

    Hey Adrian, this is a great post and I really liked the way you put everything in sequential order. I have a question though. I was wondering how can I replace the YOLO model for this object detection with Faster RCNN to suit my purposes as I have fairly small objects in my videos which I need to detect. I mean is there a deploy model and prototxt available for Faster RCNN?

    • Adrian Rosebrock November 13, 2018 at 4:14 pm #

      Yes, you would:

      1. Train your Faster R-CNN on whatever dataset you are using
      2. Then take the prototxt and Caffe model weights and swap them in

      Keep in mind that loading Faster R-CNN models is not yet 100% supported by OpenCV yet. It’s partially supported but it can be a bit of a pain.

  26. Sophia November 13, 2018 at 10:55 am #

    yet another amazingly informative tutorial! how does the speed-accuracy tradeoff of SSD compare with that of RetinaNet? thanks,

    • Adrian Rosebrock November 13, 2018 at 4:14 pm #

      In my experience RetinaNet tends to be slightly slower but also (1) slightly more accurate and (2) a bit easier to train.

      • sophia November 13, 2018 at 5:05 pm #

        thank you for replying, Adrian. that’s helpful information.

  27. joeSIX November 13, 2018 at 2:58 pm #

    This is a great tutorial, can’t thank you enough.
    unfortunately, I was unable to test it on my own (macbook pro, anaconda environment, opencv 3.4.2):
    error: (-215:Assertion failed) ifile.is_open() in function ‘ReadDarknetFromWeightsFile’

    • Adrian Rosebrock November 13, 2018 at 4:08 pm #

      Double-check your path to the input weights and configuration file. It sounds like your paths may be incorrect.

  28. Yurii November 13, 2018 at 9:23 pm #

    Hi Adrian,
    Is there a way to specify particular object to detect? For instance only cars and stop signs. It should speed up process I suppose as resources are not wasted on recognition of other objects.

    • Adrian Rosebrock November 15, 2018 at 12:14 pm #

      You can fine-tune the model to remove classes you’re not wanted in but keep in mind the number of classes isn’t going to dramatically slow down or speedup the network — all the computation is happening earlier in the network.

  29. Ramkumar November 14, 2018 at 4:53 am #

    Hi Adrian,
    The article you explained very interestingly for a beginner. Can we implement a smoke detection from image using yolo? Or only for hard objects?

    • Adrian Rosebrock November 15, 2018 at 12:07 pm #

      Object detectors work best for objects that have some sort of “form”. Smoke, like water, doesn’t have a true rigid form hence YOLO and other object detectors would not work well for smoke detection.

  30. Marcelo Mota November 14, 2018 at 5:45 am #

    Thanks for another great tutorial, Adrian!

    Could you please explain in more details lines 41 to 43? Why do you get layer names, and unconnected layers? And why that ” – 1″? code below:

    # determine only the *output* layer names that we need from YOLO
    ln = net.getLayerNames()
    ln = [ln[i[0] – 1] for i in net.getUnconnectedOutLayers()]

    And also line 52, why do you need to do a forward pass in just the “ln” layers? code below:

    layerOutputs = net.forward(ln)

    thank you!

    Marcelo

    • Adrian Rosebrock November 15, 2018 at 12:05 pm #

      The YOLO model is trained via the Darknet framework. We need to explicitly supply the output layer names into the call to “.forward()”. It’s a requirement when using Darknet models with OpenCV.

  31. Taha November 14, 2018 at 9:24 am #

    I’m getting 0.5 fps on a 1.7Ghz processor which intel core i3 4th gen. Is that okay speed for this model and system.

    • Adrian Rosebrock November 15, 2018 at 12:02 pm #

      Given that the model is running on the CPU, yes, those results seem accurate.

  32. Andrew November 14, 2018 at 12:41 pm #

    Hello Adrian,

    Nice tutorial…have you tried running YOLOv3 in C, given that it was originally written in C?
    I think there are some python wrappers out there for the datatypes

    • Adrian Rosebrock November 15, 2018 at 11:58 am #

      I haven’t tried in C but I know there is the darknetpy wrapper which can be used to run YOLO on a GPU.

  33. Thanks a lot! November 14, 2018 at 2:39 pm #

    Hi, Can you please tell me if I can run this code on Windows or not? I am stuck in Windows and cannot find a comprehensive tutorial of Yolo in Windows. Please help.

    • Adrian Rosebrock November 15, 2018 at 11:58 am #

      This tutorial will work on Windows provided you:

      1. Use the “Downloads” section of the tutorial to download the code + trained YOLO model
      2. Have OpenCV 3.4.2 or higher installed

  34. Sunny November 17, 2018 at 11:33 pm #

    Hi Adrian,

    If I want to only detect the red car inside the car chasing video by using YOLO, any suggestions on fulfilling the goal? Thank you

    • Adrian Rosebrock November 19, 2018 at 12:38 pm #

      You would:

      1. Filter on the “car” class, ignoring all other non-car detections
      2. Determine the object color

  35. moxran November 19, 2018 at 3:16 am #

    Hi,

    Since this is a yolo detector with OpenCV, it is not using the gpu, right? I’m getting 1 fps on a Intel core i7 2.2 Ghz processor, which is really slow. any reason you can see? Thanks!

    • Adrian Rosebrock November 19, 2018 at 12:25 pm #

      Correct, the YOLO detector is running on the CPU, not the GPU. Please see the other comments on this page where I’ve addressed OpenCV’s GPU capabilities.

  36. Dheeraj November 20, 2018 at 4:34 am #

    Can we count people using YOLO based Approach? If so, then what changes should be made in the code or need to use my own data set to train the model?

  37. Jelo November 20, 2018 at 11:51 am #

    Hi Adrian,

    First at all let me thank you for your all posts, really it is very useful for all.
    I would like to ask you can we use the deep learning to estimate the detect object position ? can you share some links ?
    Thank you again

    • Adrian Rosebrock November 21, 2018 at 9:37 am #

      Could you elaborate on what you mean by “object position”? What specifically are you trying to measure?

  38. Oscar Mejia November 20, 2018 at 11:01 pm #

    Hi Adrian, first of all, Thanks for your help and time to do this kind of tutorials. I really appreciate your help.

    Adrian, I would like to know if you recommend these algorithms to apply in a project to identifying, tracking and counting people in real time, if not what technique would you recommend me?

    Thanks in advance.

    • Oscar Mejia November 20, 2018 at 11:03 pm #

      I forgot mentioning that the project is for doing it on a raspberry pi 3 b+.

      Thanks.

  39. Steve November 20, 2018 at 11:48 pm #

    Hi,
    After running the yolo_video.py, it does’t display the video window, Why?

    • Adrian Rosebrock November 21, 2018 at 9:25 am #

      The YOLO video file does not display the frame on your screen, it just writes it to disk. To display the frame to your screen you can use cv2.imshow

  40. Aiwenj November 22, 2018 at 2:18 am #

    Hello ,Adrian.Thank you for your post!and i have a question that i want to know the number of people in an image.how to do it using YOLO?

    • Adrian Rosebrock November 25, 2018 at 9:33 am #

      You would loop over the number of detected objects, use an “if” statement to check if it’s a person, and then increment a counter.

  41. Abhijeet November 22, 2018 at 7:03 am #

    hi am getting this error while running your code No such file or directory: ‘yolo-coco\\coco.names please reply me

    • Adrian Rosebrock November 25, 2018 at 9:27 am #

      Make sure you are using the “Downloads” section of this blog post to download the source code and example models. It sounds like you don’t have them downloaded on your system yet.

      • Abhijeet November 30, 2018 at 7:47 am #

        Thanks man code is really understandable.. please tell my purpose of unconnectedoutput layer (ln) in code

  42. Abhishek November 22, 2018 at 4:12 pm #

    Does this work with GIFs? Also I’m getting “error: (-215:Assertion failed) !ssize.empty() in function ‘cv::resize’. Any fixes? maybe the input image seems to be empty but I’m not sure about it

    • Adrian Rosebrock November 25, 2018 at 9:19 am #

      No, OpenCV does not support loading GIFs. You’ll want to convert the GIF to a series of JPEG or PNG frames, then feed them through the YOLO object detector.

  43. frank November 23, 2018 at 9:14 am #

    Hi Adrian, thanks for your great tutorials. I have a question about the line 70 of source code in yolo.py. the length of detection is 85. detection[0:4] represent coordinates ,width and height. detection[5: ] represent the probability of 80 objects. I find the detection[4] is not used. so I want to know what detection[4] stands for.

  44. Daniele Bagni November 25, 2018 at 9:59 am #

    EXCELLENT TUTORIAL, Adrian as usual from you. Thank you very much for sharing your knowledge!

    • Adrian Rosebrock November 26, 2018 at 2:33 pm #

      Thanks so much Daniele!

  45. Guille Lopez November 28, 2018 at 7:45 am #

    Hi Adrian. Great tutorial! Extending your code I’ve been able to add the SORT algorithm to create a first approach to a traffic counter. I was thinking on open sourcing the code on Github (link removed by spam filter). Is it ok for you if I do that? I will cite your tutorial, pointing back to this page.

    • Adrian Rosebrock November 30, 2018 at 9:12 am #

      Hi Guille — congratulations on building a traffic counter, awesome job! Yes, feel free to open source the project, please just link back to the PyImageSearch blog from the GitHub readme page. Thank you!

  46. Dnyaneshwar December 3, 2018 at 10:50 pm #

    hi Adrian ,

    i am new to deep learning computer vision , i have downloaded the source code from your site.

    please let me know what set up is required and steps to run the program on windows

    • Adrian Rosebrock December 4, 2018 at 9:43 am #

      Please note that I only support Linux and macOS on this blog. I do not officially support Windows nor do I provide Windows install tutorials. I would suggest you follow my OpenCV install guides on either Linux or macOS to get up and running.

  47. Dnyaneshwar December 4, 2018 at 2:27 am #

    HI Adrian ,

    i am new to deep learning computer vision. can you help me how to get started building application for object detection using intel openvino toolkit.please provide the steps to create application and run it

    • Adrian Rosebrock December 4, 2018 at 9:39 am #

      At this time I do not have any tutorials on Intel’s OpenVINO toolkit. I will consider it for the future but I cannot guarantee if/when I may write about it.

  48. Nahael December 4, 2018 at 11:47 am #

    Hi Adrian, thank you for the tutorial,I;ve been following your post for a while now,I would like to know if its possible to traiin my own dataset to detect violent scenes in videos,it will be very kind if you could help us with that,thank you again.

    • Adrian Rosebrock December 6, 2018 at 9:55 am #

      What you are referring to is called “activity recognition”. I don’t have any tutorials regarding activity recognition (yet) but I do have a chapter inside Deep Learning for Computer Vision with Python which does show you how to detect and recognize weapons in images and video. That may be a good starting point for your project.

  49. Chris December 5, 2018 at 7:20 am #

    Cheers for this Adrian, it’s been exactly what I needed.

    I’ve now adapted the code to work with my home CCTV!

    Before my CCTV would FTP a short video when motion was detected to my server, and then I would use python to split the video into frames and email a picture of frames from 1sec, 3sec and 5sec to my email address, so where ever I am in the world I get an image of what triggered the motion sensor.

    Problem is, that I kept getting images of cats, birds, heavy rain etc.

    What I’ve done now is edit your code, so now when I get images from 1, 3 and 5secs. I run them through the code, check if the label is “person”, “car”, “truck” etc. and if so then attach the images to the email and send it.

    No more false alerts!!

    Thanks again and I love the Guru course too, having some real fun with that

    • Adrian Rosebrock December 6, 2018 at 9:40 am #

      Awesome, congratulations on adapting the code to your own project Chris!

  50. Jammula December 8, 2018 at 5:00 am #

    hello,
    Adrian Rosebrock,

    Thank you for great tutorial ,I am facing problem while executing the yolo.py for object detection in images through terminal in Jupiter notebook.

    Issue:
    I am getting “cannot connect to X SERVER ” error

    My server details:
    i am using Nvidia GeForce GTX 1080 Ti with 11173 MiB memory

    Thank You in advance.

    • Adrian Rosebrock December 11, 2018 at 12:58 pm #

      What line of code is throwing that error?

  51. Kunal Gupta December 8, 2018 at 2:02 pm #

    Thanks for the post Adrian!
    I’d like to know that it’s said that YOLO is faster than SSD’s so technically, on real time video feed, it should outperform them?
    But, when I ran it, I got 15fps in MobileNet SSD’s and around, 3fps in YOLO.
    What can be the issue?
    Thanks!

    • Adrian Rosebrock December 11, 2018 at 12:57 pm #

      You are correct that YOLO should be faster than SSD but as you found out and as I noted in the “Limitations and drawbacks of the YOLO object detector” section of the guide YOLO appears to be slower. I’m not sure why that is.

  52. Sanjay Swami December 12, 2018 at 1:25 pm #

    Hello Mr. Adrian Rosebrock

    I am very glad that I found your blog and I have started tutorials given by you.
    I am using this tutorial in my project. My project is to find the ball and track it. So will this program work on “Raspberry Pi” ?

    Thank you in advance.

    • Adrian Rosebrock December 13, 2018 at 9:01 am #

      No, the object detector will run far, far too slow on the Raspberry Pi. Is your ball a colored one? If so, follow this tutorial on simple color thresholding and you’ll be able to complete your project.

Leave a Reply