Sliding Windows for Object Detection with Python and OpenCV


So in last week’s blog post we discovered how to construct an image pyramid.

And in today’s article, we are going to extend that example and introduce the concept of a sliding window. Sliding windows play an integral role in object classification, as they allow us to localize exactly “where” in an image an object resides.

Utilizing both a sliding window and an image pyramid we are able to detect objects in images at various scales and locations.

In fact, both sliding windows and image pyramids are both used in my 6-step HOG + Linear SVM object classification framework!

To learn more about the role sliding windows play in object classification and image classification, read on. By the time you are done reading this blog post, you’ll have an excellent understanding on how image pyramids and sliding windows are used for classification.

Looking for the source code to this post?
Jump right to the downloads section.

What is a sliding window?

In the context of computer vision (and as the name suggests), a sliding window is a rectangular region of fixed width and height that “slides” across an image, such as in the following figure:

Figure 2: Example of the sliding a window approach, where we slide a window from left-to-right and top-to-bottom.

Figure 1: Example of the sliding a window approach, where we slide a window from left-to-right and top-to-bottom.

For each of these windows, we would normally take the window region and apply an image classifier to determine if the window has an object that interests us — in this case, a face.

Combined with image pyramids we can create image classifiers that can recognize objects at varying scales and locations in the image.

These techniques, while simple, play an absolutely critical role in object detection and image classification.

Sliding Windows for Object Detection with Python and OpenCV

Let’s go ahead and build on your image pyramid example from last week.

But first ensure that you have OpenCV and imutils installed:

Remember the  file? Open it back up and insert the sliding_window  function:

The sliding_window  function requires three arguments. The first is the image  that we are going to loop over. The second argument is the stepSize .

The stepSize indicates how many pixels we are going to “skip” in both the (x, y) direction. Normally, we would not want to loop over each and every pixel of the image (i.e.   stepSize=1 ) as this would be computationally prohibitive if we were applying an image classifier at each window.

Instead, the stepSize  is determined on a per-dataset basis and is tuned to give optimal performance based on your dataset of images. In practice, it’s common to use a stepSize  of 4 to 8 pixels. Remember, the smaller your step size is, the more windows you’ll need to examine.

The last argument windowSize  defines the width and height (in terms of pixels) of the window we are going to extract from our image .

Lines 24-27 are fairly straightforward and handle the actual “sliding” of the window.

Lines 24-26 define two for  loops that loop over the (x, y) coordinates of the image, incrementing their respective  x  and  y  counters by the provided step size.

Then, Line 27 returns a tuple containing the x  and y  coordinates of the sliding window, along with the window itself.

To see the sliding window in action, we’ll have to write a driver script for it. Create a new file, name it , and we’ll finish up this example:

On Lines 2-6 we import our necessary packages. We’ll use our pyramid  function from last week to construct our image pyramid. We’ll also use the sliding_window  function we just defined. Finally we import argparse  for parsing command line arguments and cv2  for our OpenCV bindings.

Lines 9-12 handle parsing our command line arguments. We only need a single switch here, the --image  that we want to process.

From there, Line 14 loads our image off disk and Line 15 defines our window width and height to be 128 pixels, respectfully.

Now, let’s go ahead and combine our image pyramid and sliding window:

We start by looping over each layer of the image pyramid on Line 18.

For each layer of the image pyramid, we’ll also loop over each window in the sliding_window  on Line 20. We also make a check on Lines 22-23 to ensure that our sliding window has met the minimum size requirements.

If we were applying an image classifier to detect objects, we would do this on Lines 25-27 by extracting features from the window and passing them on to our classifier (which is done in our 6-step HOG + Linear SVM object detection framework).

But since we do not have an image classifier, we’ll just visualize the sliding window results instead by drawing a rectangle on the image indicating where the sliding window is on Lines 30-34.


To see our image pyramid and sliding window in action, open up a terminal and execute the following command:

If all goes well you should see the following results:

Figure 2: An example of applying a sliding window to each layer of the image pyramid.

Figure 2: An example of applying a sliding window to each layer of the image pyramid.

Here you can see that for each of the layers in the pyramid a window is “slid” across it. And again, if we had an image classifier ready to go, we could take each of these windows and classify the contents of the window. An example could be “does this window contain a face or not?”

Here’s another example with a different image:

Figure 3: A second example of applying a sliding window to each layer of the image pyramid.

Figure 3: A second example of applying a sliding window to each layer of the image pyramid.

Once again, we can see that the sliding window is slid across the image at each level of the pyramid. High levels of the pyramid (and thus smaller layers) have fewer windows that need to be examined.


In this blog post, we learned all about sliding windows and their application to object detection and image classification.

By combining a sliding window with an image pyramid we are able to localize and detect objects in images at multiple scales and locations.

While both sliding windows and image pyramids are very simple techniques, they are absolutely critical in object detection.

You can learn more about the more global role they play in this blog post, where I detail my framework on how to use the Histogram of Oriented Gradients image descriptor and a Linear SVM classifier to build a custom object detector.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

105 Responses to Sliding Windows for Object Detection with Python and OpenCV

  1. joe May 11, 2015 at 3:16 pm #

    hey Adrian, wonderful article. Just wondering about when you say “Remember, the larger your step size is, the more windows you’ll need to examine.” . Shouldn’t this be “the smaller the stepsize, the more windows”?
    Maybe i misunderstood something, but it looks to me as if each sliding window would move of pixels, so – as you say a few lines above that comment – having a stepSize=1 makes it prohibitive.
    Thanks for the article

    • Adrian Rosebrock May 11, 2015 at 4:53 pm #

      Hey Joe, you’re absolutely right. Thanks for catching that typo. I have updated it now. Thanks again!

      • Aman October 25, 2018 at 3:13 pm #

        hey adrian you have provided step by step guide to install opencv in linux and mac os can you provide it for windows

        • Adrian Rosebrock October 29, 2018 at 1:50 pm #

          Sorry, no. I only officially support Linux and macOS here on the PyImageSearch blog. If you intend on following along with my tutorials I highly suggest you use Linux or macOS.

  2. Rish May 14, 2015 at 12:38 am #

    Hi Adrian,

    I have had some discussions with you in other topic threads. Your tutorials has helped me create a object detector though in C++ with ease.

    I am new to this object recognition field. I was wondering other than sliding window for object search in the image space, what other methods are there. One of the biggest issue for me in Sliding Window is that incrementing the sliding window by small pixel margin gives the best results (say about 50 – 75% overlap to the previous window). In a normal image frame this is quite exhaustive search.

    I am just curious if there are other better or faster method for object search?

    • Adrian Rosebrock May 14, 2015 at 6:41 am #

      There are indeed other methods to using sliding windows, but the sliding window is pretty much the “default”. Take a look at the comments of this post to see a discussion of some faster variants of the standard sliding window.

      However, I will say that the exhaustive image search is actually a good thing. If our classifier is working correctly, then it will provide positive classifications for regions surrounding our object. We can then apply non-maxima suppression to select only the most probable bounding box.

  3. abbas June 4, 2015 at 2:06 am #

    hi Adrian
    i am working on HOG descriptor i train svm on 64*128 positive negative images output is good but i have a problem in large image human detection so u can help me because i start research in computer vission

    • Adrian Rosebrock June 4, 2015 at 6:24 am #

      If the human you are trying to detect is substantially larger than your 64×128 window, then you should apply an image pyramid. This way the image becomes smaller at each layer of the pyramid, while your 64×128 window remains fixed, allowing you to detect larger objects (in this case, humans).

  4. Hoon October 26, 2015 at 12:29 am #

    Thanks for the wonderful article!
    I am wondering that I should change each of the step size when the resolution of the image changes because of image pyramid.
    Thanks in advance.

    • Adrian Rosebrock October 26, 2015 at 6:13 am #

      No, the step size of the sliding window normally stays constant across levels of the image pyramid.

      • Hoon October 26, 2015 at 11:11 pm #

        Thank you!
        Can I ask one more?
        Should I calculate the entire hog features for each image of different resolution?
        I am assuming the following steps.

        1) calculate HOG features of the original image
        2) collect regions that have high similarities (ROI) into a list or something
        3) resize the original image (down-size)
        4) calculate HOG features again

        n) Draw rectangles by referring to the list.

        And plus, how do I extract original location of ROI in down-sized images?

        Thank you very much!

        • Adrian Rosebrock October 27, 2015 at 4:48 am #

          I think reading this post on using HOG and Linear SVM for object detection should really help you out and answer all your questions 🙂

  5. bob January 6, 2016 at 4:43 pm #

    Wow, what great examples. Thanks. I have a question. Let’s say you have a classifier with K classes and you call the classifier for each of the N sliding windows on the current image. You essentially have a matrix with N rows and K columns. How do you process that matrix in some sensible way to report which windows have a meaningful object in them?

    • Adrian Rosebrock January 6, 2016 at 6:35 pm #

      You would simply maintain a list of bounding boxes for each of the unique classes reported by the SVM. From there, you would apply non-maxima suppression for each set of bounding boxes.

  6. Bob Zigon March 9, 2016 at 3:51 pm #

    Adrian, I have a question about your NMS logic. I applied a classifier to each of N sliding windows. I then extracted the subset of windows associated with class = 1 and passed them through the NMS. There was only 2 instances of object 1 in the FOV. Their dimension is approximately 280×200. The sliding window was 140×100. This is also the size of patches that I trained with. I was expecting the NMS to “merge” the 140×100 windows into a bounding box that more closely approximated the 280×200 of the actual objects. The NMS reported 5 objects and not 2.

    Am I using the NMS wrong? I can’t train on images that are 280×200 because I want to be able to identify the object when it is sliding out of the FOV. That is why I extracted a bunch of random 140×100 patches from the 280×200 object and trained that way.

    • Adrian Rosebrock March 9, 2016 at 4:36 pm #

      NMS is meant to merge overlapping bounding boxes, either based on their spatial dimensions, or the probability returned by your SVM (where higher probabilities are preferred over the lower ones). If your bounding boxes are not overlapping, then NMS will not suppress them. From your comment, it’s not clear if bounding boxes were overlapping?

      • Bob Zigon March 14, 2016 at 12:38 am #

        Yes, the boxes were overlapping. (I wish there was a way to embed a graphic in these comments, it would be easier to describe the situation.)

        Let me ask the question a different way. If you train your classifier with images that are 140×100 (these are random subsets of the 280×200 target image), how do you get a bounding box around the target image with the NMS?

        • Adrian Rosebrock March 14, 2016 at 3:23 pm #

          If you want like to include an image, I would suggest uploading the image to Imgur and then posting the link in the comment.

          As for the bounding boxes, please see my previous comment. You would take the entire set of bounding boxes and apply NMS based on either (1) the bounding box coordinates (such as the bottom-right corner) or (2) the probability associated with the bounding box.

          Again, NMS isn’t used to actually generate the bounding box surrounding an object, it’s used to suppress bounding boxes that have heavy overlap.

  7. Bob Zigon March 14, 2016 at 10:43 pm #

    Hmmm .. ok. The distinction seems subtle. Is it fair to say that the bounding box (with a target size of 280×200) is just the union of the 140×100 boxes in physical proximity to each other that overlap some small amount?

    • Adrian Rosebrock March 15, 2016 at 4:36 pm #

      I’m not sure I understand your question. If you can provide visual examples, I can try to answer further.

  8. Vinit March 16, 2016 at 2:34 pm #

    Hey Adrian,

    I have been reading your blogs recently and they are very helpful for my work. However I am still not able to figure out, how I am going to train the SVM for the classification.

    I got to detect humans in image so I am using INRIA dataset for training but i can’t figure out one issue that in one image I can see many persons. Right now I am just taking the hog features of the whole image once its resized to certain dimensions and then send it to train svm. But the data contains multiple human images not only single one. So can you please help me out here. Also it would be great if you can make a small post on training svm too for this object detection part.

    Thanks in advance

    • Adrian Rosebrock March 17, 2016 at 10:42 am #

      You mentioned resizing your image to a fixed size, extracting HOG features, and then passing it to your SVM — this is partly correct, but you’re missing a few critical steps. To start, I would suggest reading through a description of the entire HOG + Linear SVM pipeline.

      Instead, you need to utilize a sliding window (detailed in this post). This window is a fixed size that “slides” across your input image. At each stop along the window, you extract HOG features, and then pass them to your SVM for classification. In this way, you can detect not only a single person but multiple people at various locations in image. Combined with an image pyramid, you can recognize objects both multiple scales AND multiple locations.

      As for a source code implementation of such an object detector, please see the PyImageSearch Gurus course, where I detail how to code an object detector in detail.

  9. Mohamed Ben Arbia April 5, 2016 at 5:59 am #

    Hey Adrian,

    Excellent post. This is really helpful and straightforward. Thanks!

    • Adrian Rosebrock April 6, 2016 at 9:14 am #

      I’m glad you found it helpful Mohamed! 🙂

      • Mohamed Ben Arbia June 20, 2016 at 6:41 am #

        Hi Adran,

        I have encountered one issue during my project concerning the object detection. What if there are rotated versions of the object we would like to detect ?

        What would be the best approach ta tackle this ? Would you use rotated versions of the sliding windows ? Or would you define rotated versions of the image containing the object (And probably rotated version of the object) as the image pyramids for scaling ?

        Thanks !

        • Adrian Rosebrock June 20, 2016 at 5:23 pm #

          Rotated objects can be a real pain in the ass to detect, depending on your problem. I would suggest training a detector for each rotated version of your image. Or better yet, try to utilize algorithms that are more invariant to changes in rotation. Keypoint detection and local invariant descriptors tend to work well here as well.

          • Mohamed June 21, 2016 at 4:09 am #

            Thanks for your response Adrian 🙂
            Yes, I think using algorithms that are invariant to changes in rotation is a good approach.
            Concerning my problem, here is a link to a screen shot to the image where I have my rotated objects: The goal is to detect the footprints in the image.


          • Adrian Rosebrock June 23, 2016 at 1:31 pm #

            Why not just apply a dilation or closing morphological operation to close the gaps in between the footprints? From there, thresholding and contour detection will give you the footprint regions.

  10. farah May 6, 2016 at 4:38 am #

    When we run our classifier on sliding windows then it will fetch many bounding boxes.I want to show these bounding boxes on the original image. How to change the coordinates of the bounding boxes from the different sized windows to the original scale to be shown on the original window.

    • Adrian Rosebrock May 6, 2016 at 4:32 pm #

      Hey Farah — I assume you’re also talking about using image pyramids as well? As the image pyramid code demonstrates, you can keep track of the current scale of the pyramid and use that to give you the location of the

  11. Farah May 10, 2016 at 1:28 am #

    Sir to get to the original scale should I multiply the coordinates by the respective scaling factor used in resizing the window i.e if I am downscaling by 1.5 in both x and y direction then I just multiply the bounding boxes coordinates at this layer by 1.5.

    • Adrian Rosebrock May 10, 2016 at 8:06 am #

      Hey Farah — please see my previous comment. If you’re using sliding windows in conjunction with image pyramids, you need to keep track of ratio of the original image height to the current pyramid height. You can use this scale to multiply the bounding box coordinates and obtain them for the original image size. I cover this in more detail PyImageSearch Gurus.

      In this case, if you resize your image to be 1.5x smaller than the original, then yes, you would multiply your bounding boxes (obtained by the new, resized image) by this 1.5 factor to obtain the coordinates relative to the original image.

  12. Farah May 10, 2016 at 11:57 pm #

    Thanks Adrian for resolving my query

  13. Aka July 23, 2016 at 5:28 am #

    Hi Adrian,

    Nice post !

    I was wondering if the sliding window could be parallelised ? With a classifier which has a really low false positive rate and if the search need to be exhaustive, I feel sliding window is the best option. But say for a very large image it will be very slow. So if the sliding can be parallelised so that a list will have all the detections ( the order in which they get appended does not matter for NMS) , won’t it help speed up the detection process ?

    What do you think ? Do you know of such an implementation ?

    • Adrian Rosebrock July 27, 2016 at 2:47 pm #

      Yes, you can absolutely make the sliding window run in parallel. However, I instead recommend making the image pyramid run in parallel such that you have one process running for each of the layers of the pyramid. If you are only processing a small set of pyramid layers (or just one layer), then yes, absolutely make the sliding window run in parallel.

      I don’t have any implementations of this, but I do review how to build your own custom object detector inside the PyImageSearch Gurus course.

  14. Walid Ahmed September 28, 2016 at 4:05 pm #


    The code executed without errors for 2 images
    but nothing was shown
    any advice?

    • Adrian Rosebrock September 30, 2016 at 6:48 am #

      Can you elaborate on what you mean by “executed without error but nothing was shown”? I’m not sure I understand what you mean.

  15. Wei October 5, 2016 at 8:49 am #

    Hi, Adrian.

    I am wondering why the sliding window function does not give an “out of bound” error when “(x + winW) > image.shape[1]”?

    Thanks for the sharing, your website is very inspiring and helpful.

    • Adrian Rosebrock October 6, 2016 at 6:54 am #

      NumPy automatically prevents the out of bound error by treating the index as an array slice. If you try to slice an array past the actual bounds of the array, it simply returns all the elements along that dimension.

  16. Daryl November 2, 2016 at 4:59 pm #

    Hi Adrian,
    One doubt when i have an image pyramid i get the same image in different scales. Now from each of these images i get using the sliding window classifier say 3 images. Now how to choose between these images that i get in different levels of the pyramid.
    Example: Pyramid i have 400X400(original size);200X200;100X100
    From each i run a sliding window of 40X40
    I get 40X40 from first one
    80X80 from the second one(after scaling back to original size)
    160X160 from the third one

    • Adrian Rosebrock November 3, 2016 at 9:39 am #

      Your sliding window should always be the same fixed size — the sliding window size does not change. It’s the image pyramid itself that allows you to detect objects at different scales of the image. The sliding window simply allows you to detect objects at different locations.

      • Daryl November 3, 2016 at 2:11 pm #

        But how do i select between images of different scales was my question. If my sliding window gives 1 image in every level of the pyramid. How do i choose between these images?

        • Adrian Rosebrock November 4, 2016 at 9:56 am #

          I’m not sure what you mean by “select”. At each pyramid scale, and at each position of the sliding window you would extract your features and pass them on to your model for classification. You then apply non-maxima suppression across all levels to obtain your final detection. I detail the entire HOG + Linear SVM pipeline here. You then review the code in detail inside the PyImageSearch Gurus course.

          • Daryl November 4, 2016 at 10:13 am #

            Dont you apply non maximal suppression on each level separately?
            Because if you apply across all levels then you are comparing between bounding boxes of different sizes.

          • Adrian Rosebrock November 4, 2016 at 10:42 am #

            No, NMS is only applied after all bounding boxes are applied across all layers of the image pyramid. You resize each of your detected bounding boxes based on the ratio of the original image size to the current image size. This ensures that all bounding boxes are recorded at the same scale even though you are working with multiple scales of the image.

  17. Sumedha Agarwal November 8, 2016 at 2:30 am #

    Have been working on object detection, I was wondering why can’t we vary the window size instead of varying the image size(image pyramid).
    Any drawbacks with that?
    Thanks in advance!!!

    • Adrian Rosebrock November 10, 2016 at 7:05 am #

      Consider the HOG image descriptor which is commonly used for sliding windows and image pyramid. The size of the image/ROI passed into the HOG descriptor is influenced by the input image size. If you change the sliding window size, you change the output dimensionality of the descriptor. If all descriptors do not have the same dimensionality then you can’t apply a machine learning model to them.

      Because of this, the sliding window tends to be a fixed parameter in the model.

  18. Saloni Mittal November 30, 2016 at 3:42 am #

    My object detector (based on hog+svm) takes around 40-50 seconds to give the final result for a 1360×800 input image, with a 40×40 window size and step size 3×3-this is when i’ve done the computation for different scales parallely,creating threads. Is there any other way to speed up the process? Can we run this code on a GPU instead of using the CPU?

    • Adrian Rosebrock December 1, 2016 at 7:36 am #

      You can push the computation to the GPU, but you would need to recode using C++. The Python + OpenCV bindings do not have access to the GPU.

  19. Levy Anselmo December 1, 2016 at 1:54 pm #

    Nice article! But, can you tell me how to parse my camera? i want to try it with my camera frame by frame. Thanks!

    • Adrian Rosebrock December 5, 2016 at 1:46 pm #

      Hey Levy — can you elaborate more on what you mean by “parse” your camera?

  20. Kinley February 23, 2017 at 1:21 am #

    Hey, Can you suggest me some packages to implement the same using R.

  21. Ioannis March 6, 2017 at 3:29 pm #

    Hello Adrian, This article was very useful to me, good job.
    I applied texture analysis (GLCM) on satellite image using a sliding window (wnize=32) with a step size (step=32). I set a window size in order to make my scrip bit faster. The dimensions of the image is 250 x 200. After running the script, The final image was 8 x 6 (250/32, 200/32) due to the step size. I do not want to have a subset of the image though,
    By applying a step size, Is it possible to get the initial image back instead of a subset of it?

    I would appreciate any help
    thank you

    • Adrian Rosebrock March 6, 2017 at 3:33 pm #

      Hi Ioannis — thanks for the comment, although I’m not sure I understand your question. Can you elaborate on what you mean by the “initial image back”?

  22. Kumar Vishal March 8, 2017 at 5:44 am #

    HI Adrian, I am trying to build HOG based detector small confusion I have regarding scale factor say if I have scale factor = 1.03 that means at every step i have to reduce it by 3% percent . So If I have 648 * 460 image and min size I am putting 32 * 32 so I have to reduce the image by 3 percent every time until width (480) reduces to 32 or less. but it is creating a pyramid of approx 25 images. or more and each image from the bottom has more than 40000 patches is stride = 2 ; and over all time is coming to extract all the features is approx 1 second.


  23. zelong April 7, 2017 at 6:09 am #

    you are such amazing!!! you tutorial helped me a lot and I now I come across a problem…..I trained a ‘XML’ classfier by myself and wanted to load it by setSVMDetector( )’, It failed, I searched the internet and was told this function, setSVMDetector( ), only accepts a np.array as input…May I ask How can I transfer my XML file to a Numpy.array?

    Thanks so much..SO SO SO SO MUCH!

    • Adrian Rosebrock April 8, 2017 at 12:50 pm #

      It’s been a long time since I’ve used OpenCV to train a custom detector by scratch so I’m not sure what the solution is. Normally I recommend using a combination of OpenCV + scikit-learn to build your own detector, as detailed in the PyImageSearch Gurus course.

  24. Soumya Sourav June 7, 2017 at 5:12 pm #

    can sliding window be used for detecting multiple objects in a single image?

    • Adrian Rosebrock June 9, 2017 at 1:46 pm #

      Yes, that is the intended usage of sliding windows and image pyramids. At each step you extract features and pass them through your classifier. The classifier will report if there is an object there with a certain probability. This works with multiple objects as well. I detail the HOG + Linear SVM object detection framework in more detail inside the PyImageSearch Gurus.

  25. Stephen EL August 10, 2017 at 1:27 am #

    Hi Adrian.

    I’m a new user in python.
    I run the code but it doesnt work.
    How to solve that? Thanks

    usage: [-h] -i IMAGE error: the following arguments are required: -i/–image

  26. D R August 29, 2017 at 4:58 am #

    What should be the modification in the code if I want to detect multiple shapes like circle and rectangle ?

    • Adrian Rosebrock August 31, 2017 at 8:42 am #

      That really depends on your input image. I actually have an entire blog post on shape detection here.

  27. Robert Cvjetković September 11, 2017 at 8:11 am #

    Thank you for the article, it helped me a lot to understand and visualize sliding windows!

    • Adrian Rosebrock September 11, 2017 at 8:56 am #

      I’m glad it helped you, Robert! 🙂

  28. tania September 18, 2017 at 3:31 am #

    Thank you for this article!
    I would like to save the window images as a tiff image. So i would be abble to insert it in a PPT presentation for my Master thesis. Could somebody explain me how to do this
    thanks in advance for your help!

  29. tania September 18, 2017 at 3:33 am #

    gif image (sorry)

  30. Stephen EL November 21, 2017 at 10:14 pm #

    Hi, Adrian.

    I have an issue that using this sliding window, I detect 1 object as 3 objects.

    For example:
    I want to detect back view of a car.
    But what I detect is the side of car (left light), the middle of car (car license plate), and the other side of car (right light).
    So It detects a car as 3 objects.

    How to merge that 3 objects into 1 object?


    • Adrian Rosebrock November 22, 2017 at 9:57 am #

      You need to apply non-maxima suppression. It’s entirely normal for an object detector to report multiple bounding boxes around a single object. This is why we apply NMS.

  31. wasif November 22, 2017 at 5:09 am #

    Hello Adrian! Thanks for your lovely post. when I run “args = vars(ap.parse_args())” . I got the following error.
    usage: [-h] -i IMAGE
    : error: the following arguments are required: -i/–image
    Any idea why am i getting this? I am using Anaconda 3, python 3.6

  32. Vigna January 17, 2018 at 6:08 am #

    Is this code applicable on .tif raster image? I need to read pixel values in moving slide of 5 by 5 window and analyze abnormal distribution of pixel value within neighbourhood pixels.

    • Adrian Rosebrock January 17, 2018 at 10:08 am #

      OpenCV can load .tif files provided you have the TIFF library installed when you compiled + installed OpenCV.

      • Vigna January 18, 2018 at 3:39 am #

        Thank you for replying. I don’t understand “from pyimagesearch.helpers import pyramid”. what is pyimagesarch.helpers? what should i use on that place if i want to use the same code? i tried it with “import pyramid” but it is giving an error “module object is not callable” when i called it in a loop.

        • Adrian Rosebrock January 18, 2018 at 8:48 am #

          pyimagesearch.helpers is a module. Make sure you use the “Downloads” section of this blog post to download the code associated with the tutorial.

          Getting up and running with this code requires a bit of Python and programming knowledge. If you’re new to Python, no worries, I put together some resources to help learn Python, but you’ll definitely want to get up to speed with Python before trying to run this code.

  33. Pranith Srujan Roy March 4, 2018 at 4:50 am #

    Hi Adrian,
    Suppose I have an image with many people in it I would like to detect the people not for identifying their personality but only detect the people. So I was wondering how to draw windows of variable sizes on the picture just to identify humans.

  34. Sam March 13, 2018 at 2:54 am #

    Hi Adrian,
    Thanks for the great post! I am just confused about why you choose to vary the image scale to build a image pyramid rather than vary the windowsize to scale the origin-sized image?

    • Sam March 13, 2018 at 2:54 am #

      sorry, vary the windowsize to scan the orign-sized image:)

    • Adrian Rosebrock March 14, 2018 at 12:49 pm #

      Typically sliding windows and image pyramids are used with the HOG + Linear SVM detector. The output dimensionality of a HOG feature vector is a function of its parameters and the size of the input ROI. If you change the size of the ROI, you get a different size feature vector. For this reason you typically keep the same size window. If you’re interested in how object detectors work, be sure to take a look at the PyImageSearch Gurus course where I discuss them in more detail.

  35. Mohamed April 7, 2018 at 1:54 am #

    Hi adrian
    I an just a new intersed man:)
    But i cant run this code in pycharm because of this two line:
    from pyimagesearch.helpers import pyramid
    from pyimagesearch.helpers import sliding_windows

    • Adrian Rosebrock April 10, 2018 at 12:44 pm #

      Make sure you use the “Downloads” section of this blog post to download the code. Then add the files to your PyCharm project and ensure PyCharm can find them.

  36. vignesh May 30, 2018 at 4:20 am #

    please provide a tutorial on the object detection part also.I am one of your audience who watches most of your tutorial over and over again and try to implement most of it currently i am at the sliding window part and dont have that much knowledge to implement the detection parat on my own.i am not able to affort your book or course also so please provide the detection part also

  37. Jaiden John July 14, 2018 at 1:04 am #

    HI. Thanks For the Article. It is very helpful.
    Can you please tell me how to apply machine learning classifier to it.
    Thank You

  38. ami August 14, 2018 at 12:52 am #

    how can i use this sliding window in case of ECG image, to recognize a period of heartbeat ?

    • Adrian Rosebrock August 15, 2018 at 8:39 am #

      Sorry, I don’t have much experience with ECG images — I would need to see an example.

  39. ishani March 20, 2019 at 12:58 pm #

    Hi Adrian, Thank you for the nice tutorial. How can I implement this into a video? Instead of giving the path to an image I might have to direct it to a frame,but then how can I make sure the window slides over the whole frame before it take the next frame, Wouldn’t that be a faster process and window would miss covering the whole frame?
    thank you!

    • Adrian Rosebrock March 22, 2019 at 8:48 am #

      I would suggest you take a look at the PyImageSearch Gurus course where I show you how to properly use sliding windows for object detection in both images and video.

  40. Felipe April 14, 2019 at 7:30 pm #

    That’s great, thanks!

    • Adrian Rosebrock April 18, 2019 at 7:20 am #

      Thanks Felipe, I’m glad you liked it!

  41. zik May 6, 2019 at 10:50 am #

    Hello Adrian, I’ve been thinking, can this be applied to trading charts through a brokers website?

    • zik May 6, 2019 at 10:51 am #

      Thank you so much for the article

      • Adrian Rosebrock May 8, 2019 at 1:04 pm #

        That depends, what exactly are you trying to detect? Are you trying to detect the actual charts on the brokers website? Wouldn’t it be easier to just scrape the values of the stocks instead?

  42. Ellie June 6, 2019 at 1:28 pm #

    Hi! Your article was super helpful. I am new to Python though, and I am wondering if you could create a sliding window across two images that are the same size at once? I need to compare the two images through a window at the same location and I am not sure how to iterate through both of them using this sliding window tutorial.Thanks!

    • Ellie June 6, 2019 at 1:29 pm #

      If it’s any help at all for understanding’s sake, I am trying to implement my own version of scikit image’s compare_ssim. Any advice would be appreciated!

      • Adrian Rosebrock June 12, 2019 at 2:05 pm #

        Yes, but you would need to modify the code a bit. Having the sliding window accept two images, then have your “for” loops loop over the images. At each iteration extract the ROIs for each image and then compare them.

  43. ken June 6, 2019 at 8:38 pm #

    Hello Adrian ,
    Can sliding window be used with convolutional neural netwok. I want to detect objects in an image, but i have trained the network with images of size (128, 128) and if i want to want to use the sliding window in a larger image, it gives me errors of dimensions. Do you think it is a good idea to do object detection using CNN+ sliding window?

    • Adrian Rosebrock June 12, 2019 at 2:02 pm #

      Hi Ken — I teach you how to use CNNs originally trained for classification and instead use them for object detection inside Deep Learning for Computer Vision with Python. The short answer is yes, you can, but again, refer to the book for more details.

    • Shreya September 18, 2019 at 7:25 am #

      Hello Ken,

      Have you tried implementing Sliding Window with CNN?
      Please help me, to implement the same.

  44. Freddyc June 27, 2019 at 11:31 pm #

    This one Amazing Post. Thanks so much.. PYIMAGESEARCH THE BEST OF THE WORLD.

    • Adrian Rosebrock July 4, 2019 at 10:51 am #

      Thanks, I’m glad you enjoyed it! 🙂

  45. CJ September 11, 2019 at 2:47 pm #

    Hi, Adrian Great post!
    I would need to know which mathematical expression translates the sliding window used in this tutorial?

    • Adrian Rosebrock September 11, 2019 at 3:14 pm #

      I’m not sure what you mean by “mathematical expression” — could you elaborate on your question here?

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply