HOG detectMultiScale parameters explained

detectmultiscale_help

Last week we discussed how to use OpenCV and Python to perform pedestrian detection.

To accomplish this, we leveraged the built-in HOG + Linear SVM detector that OpenCV ships with, allowing us to detect people in images.

However, one aspect of the HOG person detector we did not discuss in detail is the detectMultiScale  function; specifically, how the parameters of this function can:

  1. Increase the number of false-positive detections (i.e., reporting that a location in an image contains a person, but when in reality it does not).
  2. Result in missing a detection entirely.
  3. Dramatically affect the speed of the detection process.

In the remainder of this blog post I am going to breakdown each of the detectMultiScale  parameters to the Histogram of Oriented Gradients descriptor and SVM detector.

I’ll also explain the trade-off between speed and accuracy that we must make if we want our pedestrian detector to run in real-time. This tradeoff is especially important if you want to run the pedestrian detector in real-time on resource constrained devices such as the Raspberry Pi.

Looking for the source code to this post?
Jump right to the downloads section.

Accessing the HOG detectMultiScale parameters

To view the parameters to the detectMultiScale  function, just fire up a shell, import OpenCV, and use the help  function:

Figure 1: The available parameters to the detectMultiScale function.

Figure 1: The available parameters to the detectMultiScale function.

You can use the built-in Python help  method on any OpenCV function to get a full listing of parameters and returned values.

HOG detectMultiScale parameters explained

Before we can explore the detectMultiScale  parameters, let’s first create a simple Python script (based on our pedestrian detector from last week) that will allow us to easily experiment:

Since most of this script is based on last week’s post, I’ll do a more quick overview of the code.

Lines 9-20 handle parsing our command line arguments The --image  switch is the path to our input image that we want to detect pedestrians in. The --win-stride  is the step size in the x and y direction of our sliding window. The --padding  switch controls the amount of pixels the ROI is padded with prior to HOG feature vector extraction and SVM classification. To control the scale of the image pyramid (allowing us to detect people in images at multiple scales), we can use the --scale  argument. And finally, --mean-shift  can be specified if we want to apply mean-shift grouping to the detected bounding boxes.

Now that we have our command line arguments parsed, we need to extract their tuple and boolean values respectively on Lines 24-26. Using the eval  function, especially on command line arguments, is not good practice, but let’s tolerate it for the sake of this example (and for the ease of allowing us to play with different --win-stride  and --padding  values).

Lines 29 and 30 initialize the Histogram of Oriented Gradients detector and sets the Support Vector Machine detector to be the default pedestrian detector included with OpenCV.

From there, Lines 33 and 34 load our image and resize it to have a maximum width of 400 pixels — the smaller our image is, the faster it will be to process and detect people in it.

Lines 37-41 detect pedestrians in our image  using the detectMultiScale  function and the parameters we supplied via command line arguments. We’ll start and stop a timer on Line 37 and 41 allowing us to determine how long it takes a single image to process for a given set of parameters.

Finally, Lines 44-49 draw the bounding box detections on our image  and display the output to our screen.

To get a default baseline in terms of object detection timing, just execute the following command:

On my MacBook Pro, the detection process takes a total of 0.09s, implying that I can process approximately 10 images per second:

Figure 2: On my system, it takes approximately 0.09s to process a single image using the default parameters.

Figure 2: On my system, it takes approximately 0.09s to process a single image using the default parameters.

In the rest of this lesson we’ll explore the parameters to detectMultiScale  in detail, along with the implications these parameters have on detection timing.

img (required)

This parameter is pretty obvious — it’s the image that we want to detect objects (in this case, people) in. This is the only required argument to the detectMultiScale  function. The image we pass in can either be color or grayscale.

hitThreshold (optional)

The hitThreshold  parameter is optional and is not used by default in the detectMultiScale  function.

When I looked at the OpenCV documentation for this function and the only description for the parameter is: “Threshold for the distance between features and SVM classifying plane”.

Given the sparse documentation of the parameter (and the strange behavior of it when I was playing around with it for pedestrian detection), I believe that this parameter controls the maximum Euclidean distance between the input HOG features and the classifying plane of the SVM. If the Euclidean distance exceeds this threshold, the detection is rejected. However, if the distance is below this threshold, the detection is accepted.

My personal opinion is that you shouldn’t bother playing around this parameter unless you are seeing an extremely high rate of false-positive detections in your image. In that case, it might be worth trying to set this parameter. Otherwise, just let non-maxima suppression take care of any overlapping bounding boxes, as we did in the previous lesson.

winStride (optional)

The winStride  parameter is a 2-tuple that dictates the “step size” in both the x and y location of the sliding window.

Both winStride  and scale  are extremely important parameters that need to be set properly. These parameter have tremendous implications on not only the accuracy of your detector, but also the speed in which your detector runs.

In the context of object detection, a sliding window is a rectangular region of fixed width and height that “slides” across an image, just like in the following figure:

Figure 3: An example of applying a sliding window to an image for face detection.

Figure 3: An example of applying a sliding window to an image for face detection.

At each stop of the sliding window (and for each level of the image pyramid, discussed in the scale  section below), we (1) extract HOG features and (2) pass these features on to our Linear SVM for classification. The process of feature extraction and classifier decision is an expensive one, so we would prefer to evaluate as little windows as possible if our intention is to run our Python script in near real-time.

The smaller winStride  is, the more windows need to be evaluated (which can quickly turn into quite the computational burden):

Figure 4: Decreasing the winStride increases the amount of time it takes it process each each.

Figure 4: Decreasing the winStride increases the amount of time it takes it process each each.

Here we can see that decreasing the winStride  to (4, 4) has actually increased our detection time substantially to 0.27s.

Similarly, the larger winStride  is the less windows need to be evaluated (allowing us to dramatically speed up our detector). However, if winStride  gets too large, then we can easily miss out on detections entirely:

Figure 5: Increasing the winStride can reduce our pedestrian detection time (0.09s down to 0.06s, respectively), but as you can see, we miss out on detecting the boy in the background.

Figure 5: Increasing the winStride can reduce our pedestrian detection time (0.09s down to 0.06s, respectively), but as you can see, we miss out on detecting the boy in the background.

I tend to start off using a winStride  value of (4, 4) and increase the value until I obtain a reasonable trade-off between speed and detection accuracy.

padding (optional)

The padding  parameter is a tuple which indicates the number of pixels in both the x and y direction in which the sliding window ROI is “padded” prior to HOG feature extraction.

As suggested by Dalal and Triggs in their 2005 CVPR paper, Histogram of Oriented Gradients for Human Detection, adding a bit of padding surrounding the image ROI prior to HOG feature extraction and classification can actually increase the accuracy of your detector.

Typical values for padding include (8, 8)(16, 16)(24, 24), and (32, 32).

scale (optional)

An image pyramid is a multi-scale representation of an image:

Figure 6: An example image pyramid.

Figure 6: An example image pyramid.

At each layer of the image pyramid the image is downsized and (optionally) smoothed via a Gaussian filter.

This scale  parameter controls the factor in which our image is resized at each layer of the image pyramid, ultimately influencing the number of levels in the image pyramid.

A smaller scale  will increase the number of layers in the image pyramid and increase the amount of time it takes to process your image:

Figure 7: Decreasing the scale to 1.01

Figure 7: Decreasing the scale to 1.01

The amount of time it takes to process our image has significantly jumped to 0.3s. We also now have an issue of overlapping bounding boxes. However, that issue can be easily remedied using non-maxima suppression.

Meanwhile a larger scale will decrease the number of layers in the pyramid as well as decrease the amount of time it takes to detect objects in an image:

Figure 8: Increasing our scale allows us to process nearly 20 images per second -- at the expense of missing some detections.

Figure 8: Increasing our scale allows us to process nearly 20 images per second — at the expense of missing some detections.

Here we can see that we performed pedestrian detection in only 0.02s, implying that we can process nearly 50 images per second. However, this comes at the expense of missing some detections, as evidenced by the figure above.

Finally, if you decrease both winStride  and scale  at the same time, you’ll dramatically increase the amount of time it takes to perform object detection:

Figure 9: Decreasing both the scale and window stride.

Figure 9: Decreasing both the scale and window stride.

We are able to detect both people in the image — but it’s taken almost half a second to perform this detection, which is absolutely not suitable for real-time applications.

Keep in mind that for each layer of the pyramid a sliding window with winStride  steps is moved across the entire layer. While it’s important to evaluate multiple layers of the image pyramid, allowing us to find objects in our image at different scales, it also adds a significant computational burden since each layer also implies a series of sliding windows, HOG feature extractions, and decisions by our SVM must be performed.

Typical values for scale  are normally in the range [1.01, 1.5]. If you intend on running detectMultiScale  in real-time, this value should be as large as possible without significantly sacrificing detection accuracy.

Again, along with the winStride , the scale  is the most important parameter for you to tune in terms of detection speed.

finalThreshold (optional)

I honestly can’t even find finalThreshold  inside the OpenCV documentation (specifically for the Python bindings) and I have no idea what it does. I assume it has some relation to the hitThreshold , allowing us to apply a “final threshold” to the potential hits, weeding out potential false-positives, but again, that’s simply speculation based on the argument name.

If anyone knows what this parameter controls, please leave a comment at the bottom of this post.

useMeanShiftGrouping (optional)

The useMeanShiftGrouping  parameter is a boolean indicating whether or not mean-shift grouping should be performed to handle potential overlapping bounding boxes. This value defaults to False  and in my opinion, should never be set to True  — use non-maxima suppression instead; you’ll get much better results.

When using HOG + Linear SVM object detectors you will undoubtably run into the issue of multiple, overlapping bounding boxes where the detector has fired numerous times in regions surrounding the object we are trying to detect:

Figure 10: An example of detecting multiple, overlapping bounding boxes.

Figure 10: An example of detecting multiple, overlapping bounding boxes.

To suppress these multiple bounding boxes, Dalal suggested using mean shift (Slide 18). However, in my experience mean shift performs sub-optimally and should not be used as a method of bounding box suppression, as evidenced by the image below:

Figure 11: Applying mean-shift to handle overlapping bounding boxes.

Figure 11: Applying mean-shift to handle overlapping bounding boxes.

Instead, utilize non-maxima suppression (NMS). Not only is NMS faster, but it obtains much more accurate final detections:

Figure 12: Instead of applying mean-shift, utilize NMS instead. Your results will be much better.

Figure 12: Instead of applying mean-shift, utilize NMS instead. Your results will be much better.

Tips on speeding up the object detection process

Whether you’re batch processing a dataset of images or looking to get your HOG detector to run in real-time (or as close to real-time as feasible), these three tips should help you milk as much performance out of your detector as possible:

  1. Resize your image or frame to be as small as possible without sacrificing detection accuracy. Prior to calling the detectMultiScale  function, reduce the width and height of your image. The smaller your image is, the less data there is to process, and thus the detector will run faster.
  2. Tune your scale  and winStride  parameters. These two arguments have a tremendous impact on your object detector speed. Both scale  and winStride  should be as large as possible, again, without sacrificing detector accuracy.
  3. If your detector still is not fast enough…you might want to look into re-implementing your program in C/C++. Python is great and you can do a lot with it. But sometimes you need the compiled binary speed of C or C++ — this is especially true for resource constrained environments.

Summary

In this lesson we reviewed the parameters to the detectMultiScale  function of the HOG descriptor and SVM detector. Specifically, we examined these parameter values in context of pedestrian detection. We also discussed the speed and accuracy tradeoffs you must consider when utilizing HOG detectors.

If your goal is to apply HOG + Linear SVM in (near) real-time applications, you’ll first want to start by resizing your image to be as small as possible without sacrificing detection accuracy: the smaller the image is, the less data there is to process. You can always keep track of your resizing factor and multiply the returned bounding boxes by this factor to obtain the bounding box sizes in relation to the original image size.

Secondly, be sure to play with your scale  and winStride  parameters. This values can dramatically affect the detection accuracy (as well as false-positive rate) of your detector.

Finally, if you still are not obtaining your desired frames per second (assuming you are working on a real-time application), you might want to consider re-implementing your program in C/C++. While Python is very fast (all things considered), there are times you cannot beat the speed of a binary executable.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

64 Responses to HOG detectMultiScale parameters explained

  1. Anuj Pahuja November 16, 2015 at 1:18 pm #

    Hi Adrian,

    Thanks again for the informative blog post. I had to use HOGDescriptor in OpenCV for one of my projects and it was a pain to use because of no clear documentation. So this was much needed.

    The ‘finalThreshold’ parameter is mainly used to select the clusters that have at least ‘finalThreshold + 1’ rectangles This parameter is passed as an argument to groupRectangles() or groupRectangles_meanShift()(when meanShift is enabled) function which rejects the small clusters containing less than or equal to ‘finalThreshold’ rectangles, computes the average rectangle size for the rest of the accepted clusters and adds those to the output rectangle list.

    These should help:
    1. http://code.opencv.org/projects/opencv/repository/entry/modules/objdetect/src/hog.cpp?rev=2.4.9#L1057

    2. http://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html#void%20groupRectangles%28vector%3CRect%3E&%20rectList,%20int%20groupThreshold,%20double%20eps%29

    Cheers,
    Anuj

    • Adrian Rosebrock November 16, 2015 at 1:52 pm #

      Thanks so much for sharing the extra details Anuj!

  2. Nrupatunga November 17, 2015 at 1:41 am #

    Dear Adrian,
    Very informative post on HOG detectmultiscale parameters. In fact I appreciate this post very much. I couldn’t find such detailed post on the net with such examples.

    Recently, I have trained HOG features(90×160) manually using SVMlight. I had a hard time to make detectmultiscale work with these parameters.

    I would like to share few observations while experimentation:

    1. doing Hard train reduced my false positives.
    2. finalThreshold and the useMeanShiftGrouping.
    setting useMeanShiftGrouping to false, gave me good detection bounding box around the person in the image and increasing the final threshold reduced the number of detection(number of bounding boxes).

    I am still working on the part of improving the detection rate. I have many images where I still couldn’t detect the person in the image.

    I have reduced false positives. I wanted to increase my detection rate as well.
    Any inputs on this. I would really appreciate your inputs.

    Thanks a lot for this post.

    Correction: if I am not mistaken, I think there should be modification in the code

    “python detectmultiscale.py –image images/person_010.bmp –scale 1.03′

    after the statement

    “Meanwhile a larger scale will decrease the number of layers in the pyramid as well as decrease the amount of time it takes to detect objects in an image:”

    • Adrian Rosebrock November 17, 2015 at 6:12 am #

      Hey Nrupatunga, thanks for the comment and all the added details, I appreciate it. If you ended up using mean shift grouping, I would suggest applying non-maxima suppression instead, you’ll likely end up getting even better results.

      In order to improve your detection rate, be sure to check the ‘C’ parameter of your SVM. Normally this value should be very small, such as C=0.01. This will create a “soft classifier” and help with your detection rate.

      Another neat little trick you can do to create more training data is “mirror” your training images. I’m not sure about your particular case, but for pedestrian detection, you the horizontal mirror of an image is still a person, thus you can use that as additional training data as well.

  3. ngapweitham November 18, 2015 at 11:21 pm #

    Thanks for the brilliant explanations. It is much easier to understand than the document of opencv.

    Anyone tried to use dlib to do the pedestrian detection?There is a video showing the reuslts(https://www.youtube.com/watch?v=wpmY_5gNbEY), cannot tell the result is good or bad with my knowledge.

    • Adrian Rosebrock November 19, 2015 at 6:19 am #

      Thanks for sharing Tham!

  4. Sebastian November 19, 2015 at 5:26 pm #

    Hi Adrian, thanks for the post

    I’m trying to do HOG detection but in real time from video camera. I’m working in a raspberry pi 2 board and the code works but the frame rate is too slow.

    How can i make the process faster?
    Do you think is possible to get good results working with the raspberry pi 2?

    Thanks

    • Adrian Rosebrock November 20, 2015 at 6:28 am #

      Hey Sebastian — pleas see the “Tips on speeding up the object detection process” section. This section of the post details tricks you can use to speed up the detection process, especially related to the Raspberry Pi 2.

    • Cam May 5, 2016 at 2:29 am #

      Hi Sebastian,

      could you help me please with the people detection in real time please, I’ve been trying but it doesn’t work, can you give me some ideas to the code, or send to me that part of the code, i really apreciate that.

      Thank you.

      • Adrian Rosebrock May 5, 2016 at 6:43 am #

        Hey Camilo — please see this followup blog post on tuning detectMultiScale parameters for real-time detection.

        • Rish February 7, 2017 at 5:38 am #

          Hey Adrian – the link leads to the same post. I’m really trying hard to do real time detection. I’m hoping to achieve 20fps (or 25fps if I can get really lucky). I’ve implemented a tracking algorithm that helps quite a bit. However, any tips to speed up the detectMultiScale function as such would be really helpful.

          As mentioned in the blogpost, changing scale from 1.20 to 1.05 increases time per 640×480 frame from 55ms to 98ms, however accuracy reduces significantly.

          • Adrian Rosebrock February 7, 2017 at 8:59 am #

            Just to clarify, you are trying to obtain 20-25 FPS on the Raspberry Pi?

  5. Vivek December 3, 2015 at 11:35 pm #

    Adrian,

    The scale parameter also take another input nLevels
    the way it works is this.

    Image size is descreased in nLevels

    if nLevel=16
    scale = 1.05

    loop runs 16 times, each time decreasing the size by scale(starts=1)*=scale

    So nLevel defines the number of loops not the scale.

    • Adrian Rosebrock December 4, 2015 at 6:26 am #

      Thanks for sharing Vivek. So just to clarify, nLevel is used to control the maximum number of layers of the image pyramid? Also, does nLevel work for the Python + OpenCV bindings or just for C++?

  6. Vivek December 4, 2015 at 7:57 am #

    Hi Adrian,
    if you look at hog.cpp file, it will show how the nlevel is used.
    our test in python shows that it does work the way it is defined..
    if you set nlevel too low say 4 and scale 1.01, you will see no small figures will be detected.
    Experiment by changing the nlevel and scale to it how it work.

    Here is a code snippet from hog.cpp. I did not see any default value.. so the loop will continue till the size of image becomes smaller than the window.

    • Adrian Rosebrock December 4, 2015 at 8:25 am #

      Thanks for the clarification! I’ll be sure to play with this parameter as well. It seems like a nice way to compliment the scale.

  7. Ulrich March 14, 2016 at 12:25 pm #

    Hello Adrian,

    Tanks a lot for this blog. I read it carefully and tried out your code with own pictures and videos. It works great!
    Do you have also a python HoG implementation which is rotation invariant? I try to detect pedestriants which are not upright (due to moving/ tilted camera).

    • Adrian Rosebrock March 14, 2016 at 3:16 pm #

      By definition, HOG is not meant to be rotation invariant. If you know how the camera is titled, simply deskew it by rotating it by theta degrees.

  8. Ulrich March 15, 2016 at 9:05 am #

    Hello Adrian,

    Thanks for your answer, this is a good idea and helps for many cases. But in some special cases I do not know how the angle theta is, due to sliding camera motion.
    I guess an other posibility could be to train the SVM with tilted exaple images. But therefor I think a quadratic window would be better than the normaly used upright 128:64 window.

    Is it possible to change the window size in the OpenCV SVM database?
    Is it possible to add HoG descriptors which are analyzed from tilted examples (pedestriants) into the existing SVM database? Or is it necessary to create your own SVM?

    Do you have a blog how to expand a SVM or how to create a own SVM?
    Is this possibly described in one of your books?

    It would be great when you can give me some answers and hints regarding my problem.

    Best regards Ulrich

    • Adrian Rosebrock March 15, 2016 at 4:31 pm #

      If you decide to create additional data samples by titling your images, then you’ll need to train a HOG detector for each set of rotations. Keep in mind that HOG, by definition, is sensitive to rotation.

      The pedestrian detector that OpenCV ships with is pre-trained, so you can’t adjust it. In your case, you would need to train your own SVM.

      I detail the steps involved in how to train a custom object detector in this post. You can find the source code implementation of the HOG + Linear SVM detector inside the PyImageSearch Gurus course.

  9. Anthony April 20, 2016 at 5:07 am #

    Hi Adrian, I really need to thank you for all those amazing posts, it really is a great job!

    Because I liked your posts, I tried to reproduce it at home, using a Raspberry Pi 2 with Python 2.7.9 and OpenCV 3.1.0.

    I’m doing real-time person detection with this Rpi and it’s working well. My problem is that I can’t find a way to count the number of person when performing hog.detectMultiScale. The return values of this function gives us the location and weights of detected people but not the exact number of these people.

    Do you have any idea of implementing it?

    • Adrian Rosebrock April 20, 2016 at 6:01 pm #

      The value returned by hog.detectMultiScale is just a list. This list represents the number of people in the image. Therefore, to get the total number of people in the image, just take the len(rects).

      • Anthony April 22, 2016 at 10:11 am #

        Thanks you very much, this helped me a lot!

        Keep going updating this blog, it’s wonderfull!

  10. Arpit Solanki May 25, 2016 at 5:54 am #

    thank you for this great post. with your approach i observed that it has a lot of false positives. one example of a false positive is that i tested it on a photo with a man and dog (front view) and it detected both of them as person. can you please help me solving this kind of issue?

    • Adrian Rosebrock May 25, 2016 at 3:19 pm #

      Since the classifier is pre-trained, you unfortunately cannot apply hard-negative mining as in the HOG + Linear SVM pipeline. Instead, you’ll need to try tuning the parameters of detectMultiScale. To start, I would work with the scale factor and try to get that as large as possible without hurting true-positive detection.

  11. Imran September 19, 2016 at 3:45 am #

    Just wondering how to incorporate detection with different postures (sitting, crawling etc) in the framework of HoG descriptors? Any work done in this regard?

    • Adrian Rosebrock September 19, 2016 at 1:01 pm #

      You would essentially need to train a separate HOG + Linear SVM detector for each of the postures you wanted to detect.

  12. Paulo September 27, 2016 at 5:34 pm #

    Hi Adrian,

    Based on your experience, what technique you indicate to detect the pattern of heads and shoulders top view?

    Best regards Paulo

    • Adrian Rosebrock September 28, 2016 at 10:41 am #

      It really depends on your dataset. I would first examine the images in the dataset and determine how much variance in appearance the heads and shoulders have. Depending on how much variance there is, I’d make a decision. For similar images with low variance I’d likely use HOG + Linear SVM. For lots of variance I’d start to consider more advanced approaches like CNNs.

      • Paulo September 28, 2016 at 3:17 pm #

        Hi Adrian, Thanks for answering.

        In my dataset is between 2 and 6 people walking together to pass through a door with a width of 60 in (1.5m). I tested the Hough transform to detect heads, but the result was not satisfactory. If I use the CNN (Convolutional Neural Network), which of your posts you recommend to start?

        Best regards Paulo

        • Adrian Rosebrock September 30, 2016 at 6:49 am #

          Is your camera fixed and non-moving? And I assume all images are coming from the same camera sensor? If so, I think HOG + Linear SVM would likely be enough here.

          • Paulo October 4, 2016 at 11:01 pm #

            Thanks Adrian!

            My camera is fixed.

            From this code, how do I adapt it for offline training? What should I change?

            Thanks…

          • Adrian Rosebrock October 6, 2016 at 6:59 am #

            I demonstrate how to train HOG + Linear SVM detectors from scratch inside the PyImageSearch Gurus course. I would suggest starting there.

  13. Wanderson September 27, 2016 at 11:53 pm #

    Dear Adrian,

    I would like to make a newbie question. Whenever I see talk about HOG detector, the classifier SVM is involved. The descriptor HOG should always be linked to a classifier? Or I can detect foreground objects with blobs analysis.

    Thanks,

    Wanderson

    • Adrian Rosebrock September 28, 2016 at 10:38 am #

      You typically see SVMs, in particular Linear SVMs, because they are very fast. You could certainly use a different classifier if you wished.

  14. James Brown October 24, 2016 at 2:29 am #

    Hello, Adrian.
    Your post is really amazing.
    I have a question about HOG.
    Is it possible to extract the human feature by removing background on selected rectangular area?
    I am researching the human body recognition project and I really hope you guide me.
    Thank you very much.
    You are super!

  15. John Beale October 24, 2016 at 2:34 pm #

    Thank you for this great blog. In your examples showing the foreground woman and background child, the green bounding box cuts through the woman’s forehead, so I’m assuming the HOG detector found her legs and torso, but missed her head(?) In other cases, the box is well centered and completely contains all the pixels showing the human figure, but includes considerable extra background also. Is this algorithm intrinsically that “fuzzy” about the precise outline, or can it be tuned to more closely match the actual boundaries of the person? Thanks again!

    • Adrian Rosebrock November 1, 2016 at 9:47 am #

      The algorithm itself isn’t “fuzzy”, it’s simply the step size of the sliding window and the size of the image pyramid. I would suggest reading this post on the HOG + Linear SVM detector to better understand how the detector works.

  16. Yonatan January 3, 2017 at 6:23 am #

    a comment regarding the hitThreshold parameter.
    It should represent the minimum Euclidean distance between the input HOG features and the classifying plane of the SVM, meaning that only if the SVM result exceeds this threshold, the detection is positive. (and if you set this threshold to small negative values, you get a lot of false positive windows).

  17. Saeed February 16, 2017 at 3:05 pm #

    Hi Adrian,
    I read “perform pedestrian detection” and current posts and there is one point that I cannot understand.
    Your sliding window’s size is fixed to be 128×64 and all the features are obtained from this window in any scale. However, when the targets are detected the boxes have different sizes. I believe all the boxes should be 128×64 but they are not. Could you please describe what causes this?

    Thank you in advance for your comment.

    • Adrian Rosebrock February 20, 2017 at 8:05 am #

      You can have different sized bounding boxes in scale space due to image pyramids. Image pyramids allow you to detect object at varying scales of the image, but as you’ll notice each bounding box as the same aspect ratio.

  18. leo February 20, 2017 at 8:39 pm #

    Hi, can explain me please, [ -i ] and [ –images ], I’m new in this area

    please help me

    # construct the argument parse and parse the arguments
    ap = argparse.ArgumentParser()
    ap.add_argument(“-i”, “–images”, required=True, help=”path to images directory”)
    args = vars(ap.parse_args())

    • Adrian Rosebrock February 22, 2017 at 1:42 pm #

      Hi Leo — I would highly suggest that you spend some time reading up on command line arguments and how they work.

  19. Ashutosh February 24, 2017 at 12:24 am #

    Dear Adrian,

    Very useful blog, Thank you for drafting contents precisely.

    I was checking the GPU version of the detectMultiScale at

    http://docs.opencv.org/2.4/modules/gpu/doc/object_detection.html#gpu-hogdescriptor-detectmultiscale

    But could not understand as to why padding is (0,0).

    “padding – Mock parameter to keep the CPU interface compatibility. It must be (0,0).”

    In that case, how to detect pedestrian at the edge of frame?

    Thanks in advance.

    • Adrian Rosebrock February 24, 2017 at 11:25 am #

      The GPU functionality of OpenCV is (unfortunately) only for C/C++. There are not Python bindings for it, so I unfortunately haven’t had a chance to play around with the GPU functions and do not have any insight there.

  20. Rob March 19, 2017 at 12:34 pm #

    Hi Adrian,

    if I want to use HOG + SVM as Traffic Sign detector, how should I do it?
    Should I train a detector for each sign, or should i build a general detector for all signs and then to distinguish the signs with another method? I want to do it in realtime, do more detectors increase the calculation effort proportionally?

    Thanks in advance.

    • Adrian Rosebrock March 21, 2017 at 7:31 am #

      You would want to train a detector for each sign. More detectors will increase the amount of time it takes to classify a given image, but the benefit is that your detections will be more accurate. Please see the PyImageSearch Gurus course for an example of building a traffic sign detector.

      • Rob March 28, 2017 at 8:02 am #

        I there a way to use detectmultiscale to distinguish between several objectclasses. I also tried to implement my own scaling and slinding windows and then use predict() to recognize the object. For this purpose i trained a linear Svm and labeled the data. Its working fine, but its so much slower than detectmultiscale.

        • Adrian Rosebrock March 28, 2017 at 12:49 pm #

          HOG + Linear SVM detectors work best when they are binary (i.e., detecting one class label for each detector). The detectMultiScale function in OpenCV only works with one class. You can implement your own method (as you’ve done), but it will be much slower in Python.

          • Rob March 29, 2017 at 12:04 pm #

            Okay thank you, i’m trying that. When i run detectmultiscale or predict, that code uses only 25% of my processor (on rpi 3). Is multiprocessing possible with these methods? How can I achieve this?

          • Adrian Rosebrock March 31, 2017 at 1:58 pm #

            As far as detectMultiScale goes, unfortunately there aren’t many optimizations on the Python side of things. If you wanted to code in C++, you could access the GPU via detectMultiScale for added speed.

  21. Nashwan March 28, 2017 at 2:01 pm #

    Hi Adrian;
    when i run this code tell me this error

    detectmultiscale.py: error: argument -i/–image is required

    i use opencv 3.0 and python 2.7 on windows 10
    i’m waiting for your help…

  22. mukesh April 4, 2017 at 4:47 pm #

    hi Adrian,
    i tried
    (rects, weights) = hog.detectMultiScale(image, winStride=winStride,
    padding=padding, scale=args[“scale”], useMeanshiftGrouping=meanShift)

    and when i printed rects and weights i got empty tuples.
    i m beginner and need some help.
    waiting for your help.

    • Adrian Rosebrock April 5, 2017 at 11:54 am #

      If you did not obtain any bounding boxes then the parameters to .detectMultiScale need some tuning. Your image might also contain poses that are not suitable for the pre-trained pedestrian detector provided with OpenCV.

  23. ramdan May 8, 2017 at 4:13 am #

    Hi Adrian

    How to train the HOG descriptor ?

  24. Sunil June 12, 2017 at 7:50 am #

    Nice effort put into the article Adrian. Is there any relation between minimum and maximum size possible to detect with the parameters of the hog/svm detector of open cv ?

    • Adrian Rosebrock June 13, 2017 at 11:01 am #

      I’m not sure what you mean by minimum/maximum size. Are you referring to the object you’re trying to detect? The HOG window? Keep in mind that we use image pyramids to find objects at varying scales in an image. You might need to upscale your image before applying the image pyramid + sliding window to detect very small objects in the background.

      • Sunil June 14, 2017 at 3:03 am #

        Yeah, sorry for not being clear, I was wondering if there is a relation between the max/min object size which can be detected in a given image and the size of HOG window used. Actually I am trying to see whether increasing the resolution by a factor of two in each dimension has some positive effect on object detection.

        • Adrian Rosebrock June 16, 2017 at 11:33 am #

          Increasing the resolution will enable you to detect objects that would otherwise be too small for the sliding window to capture. The downside is that the HOG + Linear SVM detector now has more data to process, thus making it substantially slower.

  25. alberto June 12, 2017 at 11:59 am #

    Hello,

    I’ve trained my own HOG detector using the command “opencv_traincascade” with the “-featureType HOG” flag on and it succefully generated a .xml file as a HOG detector.

    How can I implement my own XML file on the functions “cv2.HOGDescriptor() hog.setSVMDetector()” ? So I can test my HOGdetector in action.
    I have only found working examples of the default people dectector “cv2.HOGDescriptor_getDefaultPeopleDetector()”

    Thanks,
    Alberto

    • Claude June 25, 2017 at 2:58 pm #

      Hi Alberto,

      I have my SVM in yml, and then use

      hog = cv2.HOGDescriptor(
      IMAGE_SIZE, BLOCK_SIZE, BLOCK_STRIDE, CELL_SIZE, NR_BINS)
      svm = cv2.ml.SVM_load(“trained_svm2.yml”)
      hog.setSVMDetector(svm.getSupportVectors())

      Maybe it will “just work” with an xml file as well

Leave a Reply