Image Pyramids with Python and OpenCV

pyramid_adrian_scale30

It’s too damn cold up in Connecticut — so cold that I had to throw in the towel and escape for a bit.

Last week I took a weekend trip down to Orlando, FL just to escape. And while the weather wasn’t perfect (mid-60 degrees Fahrenheit, cloudy, and spotty rain, as you can see from the photo above), it was exactly 60 degrees warmer than it is in Connecticut — and that’s all that mattered to me.

While I didn’t make it to Animal Kingdom or partake in any Disney adventure rides, I did enjoy walking Downtown Disney and having drinks at each of the countries in Epcot.

Sidebar: Perhaps I’m biased since I’m German, but German red wines are perhaps some of the most under-appreciated wines there are. Imagine having the full-bodied taste of a Chianti, but slightly less acidic. Perfection. If you’re ever in Epcot, be sure to check out the German wine tasting.

Anyway, as I boarded the plane to fly back from the warm Florida paradise to the Connecticut tundra, I started thinking about what the next blog post on PyImageSearch was going to be.

Really, it should not have been that long (or hard) of an exercise, but it was a 5:27am flight, I was still half asleep, and I’m pretty sure I still had a bit of German red wine in my system.

After a quick cup of (terrible) airplane coffee, I decided on a 2-part blog post:

  • Part #1: Image Pyramids with Python and OpenCV.

  • Part #2: Sliding Windows for Image Classification with Python and OpenCV.

You see, a few months ago I wrote a blog post on utilizing the Histogram of Oriented Gradients image descriptor and a Linear SVM to detect objects in images. This 6-step framework can be used to easily train object classification models.

A critical aspect of this 6-step framework involves image pyramids and sliding windows.

Today we are going to review two ways to create image pyramids using Python, OpenCV, and sickit-image. And next week we’ll discover the simple trick to create highly efficient sliding windows.

Utilizing these two posts we can start to glue together the pieces of our HOG + Linear SVM framework so you can build object classifiers of your own!

Read on to learn more…

Looking for the source code to this post?
Jump right to the downloads section.

What are image pyramids?

Figure 1: An example of an image pyramid. At each layer of the pyramid the image is downsized and (optionally) smoothed.

Figure 1: An example of an image pyramid. At each layer of the pyramid the image is downsized and (optionally) smoothed (image source).

An “image pyramid” is a multi-scale representation of an image.

Utilizing an image pyramid allows us to find objects in images at different scales of an image. And when combined with a sliding window we can find objects in images in various locations.

At the bottom of the pyramid we have the original image at its original size (in terms of width and height). And at each subsequent layer, the image is resized (subsampled) and optionally smoothed (usually via Gaussian blurring).

The image is progressively subsampled until some stopping criterion is met, which is normally a minimum size has been reached and no further subsampling needs to take place.

Method #1: Image Pyramids with Python and OpenCV

The first method we’ll explore to construct image pyramids will utilize Python + OpenCV.

In fact, this is the exact same image pyramid implementation that I utilize in my own projects!

Let’s go ahead and get this example started. Create a new file, name it helpers.py , and insert the following code:

We start by importing the imutils  package which contains a handful of image processing convenience functions that are commonly used such as resizing, rotating, translating, etc. You can read more about the  imutils  package here. You can also grab it off my GitHub. The package is also pip-installable:

Next up, we define our pyramid  function on Line 4. This function takes two arguments. The first argument is the scale , which controls by how much the image is resized at each layer. A small scale  yields more layers in the pyramid. And a larger scale  yields less layers.

Secondly, we define the minSize , which is the minimum required width and height of the layer. If an image in the pyramid falls below this minSize , we stop constructing the image pyramid.

Line 6 yields the original image in the pyramid (the bottom layer).

From there, we start looping over the image pyramid on Line 9.

Lines 11 and 12 handle computing the size of the image in the next layer of the pyramid (while preserving the aspect ratio). This scale is controlled by the scale  factor.

On Lines 16 and 17 we make a check to ensure that the image meets the minSize  requirements. If it does not, we break from the loop.

Finally, Line 20 yields our resized image.

But before we get into examples of using our image pyramid, let’s quickly review the second method.

Method #2: Image pyramids with Python + scikit-image

The second method to image pyramid construction utilizes Python and scikit-image. The scikit-image library already has a built-in method for constructing image pyramids called pyramid_gaussian , which you can read more about here.

Here’s an example on how to use the pyramid_gaussian  function in scikit-image:

Similar to the example above, we simply loop over the image pyramid and make a check to ensure that the image has a sufficient minimum size. Here we specify downscale=2  to indicate that we are halving the size of the image at each layer of the pyramid.

Image pyramids in action

Now that we have our two methods defined, let’s create a driver script to execute our code. Create a new file, name it pyramid.py , and let’s get to work:

We’ll start by importing our required packages. I put my personal pyramid  function in a helpers  sub-module of pyimagesearch  for organizational purposes.

You can download the code at the bottom of this blog post for my project files and directory structure.

We then import the scikit-image pyramid_gaussian function, argparse  for parsing command line arguments, and cv2  for our OpenCV bindings.

Next up, we need to parse some command line arguments on Lines 9-11. Our script requires only two switches, --image , which is the path to the image we are going to construct an image pyramid for, and --scale , which is the scale factor that controls how the image will be resized in the pyramid.

Line 14 loads then our image from disk.

We can start utilize our image pyramid Method #1 (my personal method) on Lines 18-21 where we simply loop over each layer of the pyramid and display it on screen.

Then from Lines 27-34 we utilize the scikit-image method (Method #2) for image pyramid construction.

To see our script in action, open up a terminal, change directory to where your code lives, and execute the following command:

If all goes well, you should see results similar to this:

Figure 2: Constructing an image pyramid with 7 layers and no smoothing.

Figure 2: Constructing an image pyramid with 7 layers and no smoothing (Method #1).

Here we can see that 7 layers have been generated for the image.

And similarly for the scikit-image method:

Figure 3: Generating 4 layers of the image pyramid with scikit-image.

Figure 3: Generating 4 layers of the image pyramid with scikit-image (Method #2).

The scikit-image pyramid generated 4 layers since it reduced the image by 50% at each layer.

Now, let’s change the scale factor to 3.0  and see how the results change:

And the resulting pyramid now looks like:

Increase the scale factor from 1.5 to 3.0 has reduced the number of layers generated.

Figure 4: Increase the scale factor from 1.5 to 3.0 has reduced the number of layers generated.

Using a scale factor of 3.0 , only 3 layers have been generated.

In general, there is a tradeoff between performance and the number of layers that you generate. The smaller your scale factor is, the more layers you need to create and process — but this also gives your image classifier a better chance at localizing the object you want to detect in the image.

A larger scale factor will yield less layers, and perhaps might hurt your object classification performance; however, you will obtain much higher performance gains since you will have less layers to process.

Summary

In this blog post we discovered how to construct image pyramids using two methods.

The first method to image pyramid construction used Python and OpenCV and is the method I use in my own personal projects. Unlike the traditional image pyramid, this method does not smooth the image with a Gaussian at each layer of the pyramid, thus making it more acceptable for use with the HOG descriptor.

The second method to pyramid construction utilized Python + scikit-image and did apply Gaussian smoothing at each layer of the pyramid.

So which method should you use?

In reality, it depends on your application. If you are using the HOG descriptor for object classification you’ll want to use the first method since smoothing tends to hurt classification performance.

If you are trying to implement something like SIFT or the Difference of Gaussian keypoint detector, then you’ll likely want to utilize the second method (or at least incorporate smoothing into the first).

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

38 Responses to Image Pyramids with Python and OpenCV

  1. Oliver March 17, 2015 at 2:40 am #

    Actually in option one you should smooth the image with a gaussian filter to remove high frequencies before down scaling and to prevent aliasing effects.

    • Adrian Rosebrock March 17, 2015 at 6:39 am #

      Hi Oliver, thanks for the comment. However, as mentioned in the Summary section of this article, Method #1 is intended to be used for object classification using HOG + Linear SVM. As demonstrated by Dalal and Triggs in their Histogram of Oriented Gradients for Human Detection paper, applying Gaussian smoothing (even to remove high frequency noise) prior to extracting HOG features actually hurts classification performance. So when using HOG, it’s actually best to just subsample and avoid the Gaussian smoothing.

      • Tomasz Malisiewicz March 21, 2015 at 10:20 am #

        I’m going to agree with Adrian. Gaussian smooth during downscaling seems like a good idea when you read the signal processing literature, but for features like HOG it doesn’t really matter.

        With HOG it’s all about speed. The fewer operations your perform before the classifier hits the pixels, the better off you’ll be.

        If you implement your own HOG (only loosely following Navneet Dalal’s recipe), you should build a test suite which measures your computation time and descriptor/classifier performance, and iterate. I can envision a variant of HOG that benefits from smoothing, but experiments should have the final word.

        • Adrian Rosebrock March 21, 2015 at 11:11 am #

          Hey, thanks for the comment Tomasz. I could not agree with you more — experiments should always have the final word.

      • Gerardo November 10, 2016 at 4:59 pm #

        Just to clarify, the resizing operation used in the imutils library does perform some level of smoothing to avoid aliasing effects (Moire patterns in the case of images). They call it interpolation, which only means it uses a different type of smoothing, not as strong as when using a Gaussian filter. Too much smoothing will destroy edge information which has a big impact on gradient computation. Depending on the type on interpolation scheme, you will tend to use digital filters with sharper transition bands to preserve edge information as much as possible. Downsampling without filtering would be as bad as oversmoothing for HOGs.

  2. jenn T June 29, 2015 at 1:23 pm #

    Hi!, that’s exactly what I am looking for, thank you so much for doing this post. I was reading about the use of Gaussian smooth during downscaling and now i am wondering if for haar-like features it should be used or not. thank you again.

    • Adrian Rosebrock June 29, 2015 at 2:46 pm #

      Haar features are normally scaled in the feature space rather than the image space to avoid (unnecessarily) recomputing the integral images. I would refer to the original Viola-Jones paper on Haar cascades to read more about their sampling scheme.

  3. Mau August 6, 2015 at 2:54 pm #

    Why don’t simply use Pyrup and Pyrdown from opencv?

    • Adrian Rosebrock August 7, 2015 at 7:07 am #

      OpenCV’s implementation of pyramids don’t give enough control. For example, one of my primary use cases for image pyramids is for object detection using Histogram of Oriented Gradients. It’s well known that applying Gaussian blurring prior to extracting HOG feature vectors can actually hurt performance — which is something OpenCV’s implementation of pyramids do, hence the scikit-image implementation is a better choice (at least in my opinion).

  4. BlackDragon December 6, 2015 at 2:44 am #

    Hi Adrian, i have a question: The scale parameter will resized at each subsequent layer, so when will the parameter will stop?

    • Adrian Rosebrock December 6, 2015 at 7:11 am #

      Take a look at Line 4 where we define the minSize argument to the pyramid function. Once the image falls below minSize, we break from the loop.

      • BlackDragon December 6, 2015 at 11:07 am #

        awww sorry my bad, by the way can you explain to me how does the scale parameter work?
        i read it from https://www.pyimagesearch.com/2015/11/16/hog-detectmultiscale-parameters-explained/ but i still nervous. please!

        • Adrian Rosebrock December 6, 2015 at 11:17 am #

          The scale simply controls the number of levels that are ultimately generated by the image pyramid. The smaller the scale, more layers in the image pyramid are generated. The larger the scale, the less pyramids are generated. I would suggest downloading the source code to this post and running the examples with varying scale parameters to convince yourself of this.

  5. Abhishek Tiwari November 15, 2016 at 2:24 am #

    Hi, I have been using your tutorial to build a gesture classifier and localiser. I was successful in doing so. However, I do not understand what to do with image pyramid. I understand why it is used but i donot get what we have to do once we obtain the pyramid. Suppose I obtained the pyramid, ran my localiser on each image of the pyramid and then obtained a list of sliding windows which my localising classifier suggest is a window of interest. What do I do now? How do I combine the results? How do I get the best bounding box from all the scaled images?
    Thanks in advance! Your tutorial is extremely helpful!

    • Adrian Rosebrock November 16, 2016 at 1:49 pm #

      Hey Abhishek. My first suggestion would be to go through the PyImageSearch Gurus course where I explain how to build your own custom object detectors in lots of detail.

      Secondly, you need to run your sliding window on each layer of the image pyramid and obtain your predictions. Once you have them, apply non-maxima suppression.

      Again, I cover this entire process (with lots of code) inside the course.

      • Abhishek Tiwari November 17, 2016 at 11:29 am #

        Hi Adrian thanks for your reply! Okay so I did run sliding window on each layer of the pyramid, that gave me a list of windows most likely to be my object. I took the one of highest probability from each layer and then took the maximum probability one from that final set. I am guessing this is not the correct way to use Image Pyramid.

        You said : run NMS after running sliding window on all layers. NMS algo takes as input a set of top left and bottom right coordinates, correct? Now, when i run sliding window on all layers, i will obtain coordinates of boxes in different scales of images. Should I give these coordinates as input directly to the NMS algo or do I have to rescale them somehow?

        And I will definitely check out the course! Thanks!

        • Adrian Rosebrock November 18, 2016 at 8:54 am #

          Your intuition is correct. When you append a bounding box to your set of possible predictions, scale the bounding box by the ratio of the current pyramid level to the original scale of the image. This will give you the coordinates of the bounding boxes in the same scale as the original image. You then apply NMS to these bounding boxes.

  6. aNuo January 20, 2017 at 7:59 am #

    Many thanks for your tut.

    In HOG, before I fuse the overlap windows I have to know the coordinates of window in the original image. Should I construct a mapping between original image and resized image? Or this kind of mapping already exist some where?

    • aNuo January 20, 2017 at 8:26 am #

      OK, it seems a very stupid question…

    • Adrian Rosebrock January 20, 2017 at 10:52 am #

      You are correct, you would want to map the coordinates from the resized image to the original image. This can be accomplished by computing the ratio of the new image width to the old image width. I cover this in more detail inside the PyImageSearch Gurus course.

  7. aNuo January 20, 2017 at 8:11 am #

    Hi again,

    Also when I apply imutils.resize function, always get the left half of my image, do you know why? I just use ‘img_resize=imutils.resize(img, halfsize_of_my_image)’

    Many thanks

    • Adrian Rosebrock January 20, 2017 at 10:53 am #

      I’m not sure what you mean by the “left half of my image”, can you elaborate?

  8. Walid January 27, 2017 at 2:25 pm #

    Hi Adrian
    Thanks a lot
    of I got (x1,y1,x2,y2) at certain layer, how can I get the values on the original image to point to same box?

    • Adrian Rosebrock January 28, 2017 at 6:46 am #

      Simply compute the ratio of the original image dimensions to the current dimensions of the layer. Multiply the coordinates by this ratio and you’ll obtain the coordinates in relation to the original image. I cover this, and the rest of the object detection framework, inside the PyImageSearch Gurus course.

  9. Yash Baley August 4, 2017 at 6:37 am #

    How to install Pyimagesearch module??

    • Adrian Rosebrock August 4, 2017 at 6:46 am #

      It is not pip-installable. Just use the “Downloads” section at the bottom of this blog post to download the source code to this post. It will include the “pyimagesearch” module.

      • Yash Baley August 4, 2017 at 7:17 am #

        Thanks for the quick reply Adrian. But still, cv2 module is not available in the “Downloads” section

  10. Fav October 14, 2017 at 8:03 pm #

    Hey Adrian thanks for this amazing tutorial im following it but i dont understand something

    in the first method when you resize the image by dividing the column number by the scale factor , why you dont do the same with the height or rows of the image ? im not following that , something related with the reshape function perhaps?

    also why do you compare at the end , the height of the new image with the min width , shouldnt we compare img height with min height and img width with min width?

    • Adrian Rosebrock October 16, 2017 at 12:32 pm #

      The first method (using the .resize function of imutils) automatically preserves the aspect ratio of the input image. As long as we compute the ratio with respect to one dimension the image will be properly resized.

  11. Phawit Wongsarawit July 20, 2018 at 2:09 am #

    How do I save a picture after apply the pyramid layer

    • Adrian Rosebrock July 20, 2018 at 6:22 am #

      You can use the “cv2.imwrite” function to write each layer visualization to disk.

  12. satyam sareen August 17, 2018 at 1:00 pm #

    I am not able to understand how decreasing a size of image will help in detecting an object using pyrdown.
    I am not able to understand its usage

    • Adrian Rosebrock August 17, 2018 at 3:47 pm #

      The size of your sliding window ROI stays the same. If you keep the size of the sliding window the same and then go up and down layers of a pyramid you’ll be able to detect objects that are larger and smaller.

  13. Maksym Ganenko November 7, 2018 at 3:42 am #

    Hi, Adrian,

    I think you should have mentioned cv2.pyrDown() for completeness because OpenCV has implementation of Gaussian pyramid without dependence on scikit-image package.

    Also, when you build image pyramid with cv2.resize() you use default bilinear interpolation. Is it really better than, say, Lanczos8 which better preserves edges? (in theory)

    • Adrian Rosebrock November 10, 2018 at 10:19 am #

      1. The pyrDown function will introduce Gaussian blurring which can actually hurt the performance of HOG-based object detectors (which is the context in which this post was originally published).

      That said…

      2. All the type of interpolation and whether or not Gaussian blurring is performed is really just a hyperparameter. Run experiments and let your empirical results drive your decisions.

Trackbacks/Pingbacks

  1. Sliding Windows for Object Detection with Python and OpenCV - PyImageSearch - March 23, 2015

    […] in last week’s blog post we discovered how to construct an image […]

  2. Detecting cats in images with OpenCV - PyImageSearch - June 20, 2016

    […] scaleFactor  of our image pyramid used when detecting cat faces. A larger scale factor will increase the speed of the detector, but […]

Leave a Reply