So in last week’s blog post we discovered how to construct an image pyramid.
And in today’s article we are going to extend that example and introduce the concept of a sliding window. Sliding windows play an integral role in object classification, as they allow us to localize exactly “where” in an image an object resides.
Utilizing both a sliding window and an image pyramid we are able to detect objects in images at various scales and locations.
In fact, both sliding windows and image pyramids are both used in my 6-step HOG + Linear SVM object classification framework!
To learn more about the role sliding windows play in object classification and image classification, read on. By the time you are done reading this blog post, you’ll have an excellent understanding on how image pyramids and sliding windows are used for classification.
Looking for the source code to this post?
Jump right to the downloads section.
OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.
What is a sliding window?
In the context of computer vision (and as the name suggests), a sliding window is rectangular region of fixed width and height that “slides” across an image, such as in the following figure:
For each of these windows, we would normally take the window region and apply an image classifier to determine if the window has an object that interests us — in this case, a face.
Combined with image pyramids we can create image classifiers that can recognize objects at varying scales and locations in the image.
These techniques, while simple, play an absolutely critical role in object detection and image classification.
Sliding Windows for Object Detection with Python and OpenCV
Let’s go ahead and build on your image pyramid example from last week.
Remember the helpers.py file? Open it back up and insert the sliding_window function:
def sliding_window(image, stepSize, windowSize):
# slide a window across the image
for y in xrange(0, image.shape, stepSize):
for x in xrange(0, image.shape, stepSize):
# yield the current window
yield (x, y, image[y:y + windowSize, x:x + windowSize])
The sliding_window function requires three arguments. The first is the image that we are going to loop over. The second argument is the stepSize .
The stepSize indicates how many pixels we are going to “skip” in both the (x, y) direction. Normally, we would not want to loop over each and every pixel of the image (i.e. stepSize=1 ) as this would be computationally prohibitive if we were applying an image classifier at each window.
Instead, the stepSize is determined on a per-dataset basis and is tuned to give optimal performance based on your dataset of images. In practice, it’s common to use a stepSize of 4 to 8 pixels. Remember, the smaller your step size is, the more windows you’ll need to examine.
The last argument windowSize defines the width and height (in terms of pixels) of the window we are going to extract from our image .
Lines 24-27 are fairly straightforward and handle the actual “sliding” of the window.
Lines 24-26 define two for loops that loop over the (x, y) coordinates of the image, incrementing their respective x and y counters by the provided step size.
Then, Line 27 returns a tuple containing the x and y coordinates of the sliding window, along with the window itself.
To see the sliding window in action, we’ll have to write a driver script for it. Create a new file, name it sliding_window.py , and we’ll finish up this example:
# import the necessary packages
from pyimagesearch.helpers import pyramid
from pyimagesearch.helpers import sliding_window
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
# load the image and define the window width and height
image = cv2.imread(args["image"])
(winW, winH) = (128, 128)
On Lines 2-6 we import our necessary packages. We’ll use our pyramid function from last week to construct our image pyramid. We’ll also use the sliding_window function we just defined. Finally we import argparse for parsing command line arguments and cv2 for our OpenCV bindings.
Lines 9-12 handle parsing our command line arguments. We only need a single switch here, the --image that we want to process.
From there, Line 14 loads our image off disk and Line 15 defines our window width and height to be 128 pixels, respectfully.
Now, let’s go ahead and combine our image pyramid and sliding window:
# loop over the image pyramid
for resized in pyramid(image, scale=1.5):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(resized, stepSize=32, windowSize=(winW, winH)):
# if the window does not meet our desired window size, ignore it
if window.shape != winH or window.shape != winW:
# THIS IS WHERE YOU WOULD PROCESS YOUR WINDOW, SUCH AS APPLYING A
# MACHINE LEARNING CLASSIFIER TO CLASSIFY THE CONTENTS OF THE
# since we do not have a classifier, we'll just draw the window
clone = resized.copy()
cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
We start by looping over each layer of the image pyramid on Line 18.
For each layer of the image pyramid, we’ll also loop over each window in the sliding_window on Line 20. We also make a check on Lines 22-23 to ensure that our sliding window has met the minimum size requirements.
If we were applying an image classifier to detect objects, we would do this on Lines 25-27 by extracting features from the window and passing them on to our classifier (which is done in our 6-step HOG + Linear SVM object detection framework).
But since we do not have an image classifier, we’ll just visualize the sliding window results instead by drawing a rectangle on the image indicating where the sliding window is on Lines 30-34.
To see our image pyramid and sliding window in action, open up a terminal and execute the following command:
$ python sliding_window.py --image images/adrian_florida.jpg
If all goes well you should see the following results:
Here you can see that for each of the layers in the pyramid a window is “slid” across it. And again, if we had an image classifier ready to go, we could take each of these windows and classify the contents of the window. An example could be “does this window contain a face or not?”
Here’s another example with a different image:
$ python sliding_window.py --image images/stick_of_truth.jpg.jpg
Once again, we can see that the sliding window is slid across the image at each level of the pyramid. High levels of the pyramid (and thus smaller layers) have less windows that need to be examined.
In this blog post we learned all about sliding windows and their application to object detection and image classification.
By combining a sliding window with an image pyramid we are able to localize and detect objects in images at multiple scales and locations.
While both sliding windows and image pyramids are very simple techniques, they are absolutely critical in object detection.
You can learn more about the more global role they play in this blog post, where I detail my framework on how to use the Histogram of Oriented Gradients image descriptor and a Linear SVM classifier to build a custom object detector.