4:18am. Alarm blaring. Still dark outside. The bed is warm. And the floor will feel so cold on my bare feet.
But I got out of bed. I braved the morning, and I took the ice cold floor on my feet like a champ.
Because I’m excited.
Excited to share something very special with you today…
You see, over the past few weeks I’ve gotten some really great emails from fellow PyImageSearch readers. These emails were short, sweet, and to the point. They were simple “thank you’s” for posting actual, honest-to-goodness Python and OpenCV code that you could take and use to solve your own computer vision and image processing problems.
And upon reflection last night, I realized that I’m not doing a good enough job sharing the libraries, packages, and code that I have developed for myself for everyday use — so that’s exactly what I’m going to do today.
In this blog post I’m going to show you the functions in my transform.py module. I use these functions whenever I need to do a 4 point cv2.getPerspectiveTransform using OpenCV.
And I think you’ll find the code in here quite interesting…and you’ll even be able to utilize it in your own projects.
So read on. And checkout my 4 point OpenCV cv2.getPerspectiveTransform example.
Looking for the source code to this post?
Jump right to the downloads section.
OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.
4 Point OpenCV getPerspectiveTransform Example
You may remember back to my posts on building a real-life Pokedex, specifically, my post on OpenCV and Perspective Warping.
In that post I mentioned how you could use a perspective transform to obtain a top-down, “birds eye view” of an image — provided that you could find reference points, of course.
This post will continue the discussion on the top-down, “birds eye view” of an image. But this time I’m going to share with you personal code that I use every single time I need to do a 4 point perspective transform.
So let’s not waste any more time. Open up a new file, name it transform.py, and let’s get started.
# import the necessary packages
import numpy as np
# initialzie a list of coordinates that will be ordered
# such that the first entry in the list is the top-left,
# the second entry is the top-right, the third is the
# bottom-right, and the fourth is the bottom-left
rect = np.zeros((4, 2), dtype = "float32")
# the top-left point will have the smallest sum, whereas
# the bottom-right point will have the largest sum
s = pts.sum(axis = 1)
rect = pts[np.argmin(s)]
rect = pts[np.argmax(s)]
# now, compute the difference between the points, the
# top-right point will have the smallest difference,
# whereas the bottom-left will have the largest difference
diff = np.diff(pts, axis = 1)
rect = pts[np.argmin(diff)]
rect = pts[np.argmax(diff)]
# return the ordered coordinates
We’ll start off by importing the packages we’ll need: NumPy for numerical processing and cv2 for our OpenCV bindings.
Next up, let’s define the order_points function on Line 5. This function takes a single argument, pts , which is a list of four points specifying the (x, y) coordinates of each point of the rectangle.
It is absolutely crucial that we have a consistent ordering of the points in the rectangle. The actual ordering itself can be arbitrary, as long as it is consistent throughout the implementation.
Personally, I like to specify my points in top-left, top-right, bottom-right, and bottom-left order.
We’ll start by allocating memory for the four ordered points on Line 10.
Then, we’ll find the top-left point, which will have the smallest x + y sum, and the bottom-right point, which will have the largest x + y sum. This is handled on Lines 14-16.
Of course, now we’ll have to find the top-right and bottom-left points. Here we’ll take the difference (i.e. x – y) between the points using the np.diff function on Line 21.
The coordinates associated with the smallest difference will be the top-right points, whereas the coordinates with the largest difference will be the bottom-left points (Lines 22 and 23).
Finally, we return our ordered functions to the calling function on Line 26.
Again, I can’t stress again how important it is to maintain a consistent ordering of points.
And you’ll see exactly why in this next function:
def four_point_transform(image, pts):
# obtain a consistent order of the points and unpack them
rect = order_points(pts)
(tl, tr, br, bl) = rect
# compute the width of the new image, which will be the
# maximum distance between bottom-right and bottom-left
# x-coordiates or the top-right and top-left x-coordinates
widthA = np.sqrt(((br - bl) ** 2) + ((br - bl) ** 2))
widthB = np.sqrt(((tr - tl) ** 2) + ((tr - tl) ** 2))
maxWidth = max(int(widthA), int(widthB))
# compute the height of the new image, which will be the
# maximum distance between the top-right and bottom-right
# y-coordinates or the top-left and bottom-left y-coordinates
heightA = np.sqrt(((tr - br) ** 2) + ((tr - br) ** 2))
heightB = np.sqrt(((tl - bl) ** 2) + ((tl - bl) ** 2))
maxHeight = max(int(heightA), int(heightB))
# now that we have the dimensions of the new image, construct
# the set of destination points to obtain a "birds eye view",
# (i.e. top-down view) of the image, again specifying points
# in the top-left, top-right, bottom-right, and bottom-left
dst = np.array([
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]], dtype = "float32")
# compute the perspective transform matrix and then apply it
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
# return the warped image
We start off by defining the four_point_transform function on Line 28, which requires two arguments: image and pts .
The image variable is the image we want to apply the perspective transform to. And the pts list is the list of four points that contain the ROI of the image we want to transform.
We make a call to our order_points function on Line 31, which places our pts variable in a consistent order. We then unpack these coordinates on Line 32 for convenience.
Now we need to determine the dimensions of our new warped image.
We determine the width of the new image on Lines 37-39, where the width is the largest distance between the bottom-right and bottom-left x-coordinates or the top-right and top-left x-coordinates.
In a similar fashion, we determine the height of the new image on Lines 44-46, where the height is the maximum distance between the top-right and bottom-right y-coordinates or the top-left and bottom-left y-coordinates.
Note: Big thanks to Tom Lowell who emailed in and made sure I fixed the width and height calculation!
So here’s the part where you really need to pay attention.
Remember how I said that we are trying to obtain a top-down, “birds eye view” of the ROI in the original image? And remember how I said that a consistent ordering of the four points representing the ROI is crucial?
On Lines 53-57 you can see why. Here, we define 4 points representing our “top-down” view of the image. The first entry in the list is (0, 0) indicating the top-left corner. The second entry is (maxWidth - 1, 0) which corresponds to the top-right corner. Then we have (maxWidth - 1, maxHeight - 1) which is the bottom-right corner. Finally, we have (0, maxHeight - 1) which is the bottom-left corner.
The takeaway here is that these points are defined in a consistent ordering representation — and will allow us to obtain the top-down view of the image.
To actually obtain the top-down, “birds eye view” of the image we’ll utilize the cv2.getPerspectiveTransform function on Line 60. This function requires two arguments, rect , which is the list of 4 ROI points in the original image, and dst , which is our list of transformed points. The cv2.getPerspectiveTransform function returns M , which is the actual transformation matrix.
We apply the transformation matrix on Line 61 using the cv2.warpPerspective function. We pass in the image , our transform matrix M , along with the width and height of our output image.
The output of cv2.warpPerspective is our warped image, which is our top-down view.
We return this top-down view on Line 64 to the calling function.
Now that we have code to perform the transformation, we need some code to drive it and actually apply it to images.
Open up a new file, call transform_example.py , and let’s finish this up:
# import the necessary packages
from pyimagesearch.transform import four_point_transform
import numpy as np
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", help = "path to the image file")
help = "comma seperated list of source points")
args = vars(ap.parse_args())
# load the image and grab the source coordinates (i.e. the list of
# of (x, y) points)
# NOTE: using the 'eval' function is bad form, but for this example
# let's just roll with it -- in future posts I'll show you how to
# automatically determine the coordinates without pre-supplying them
image = cv2.imread(args["image"])
pts = np.array(eval(args["coords"]), dtype = "float32")
# apply the four point tranform to obtain a "birds eye view" of
# the image
warped = four_point_transform(image, pts)
# show the original and warped images
The first thing we’ll do is import our four_point_transform function on Line 2. I decided put it in the pyimagesearch sub-module for organizational purposes.
We’ll then use NumPy for the array functionality, argparse for parsing command line arguments, and cv2 for OpenCV bindings.
We parse our command line arguments on Lines 8-12. We’ll use two switches, --image , which is the image that we want to apply the transform to, and --coords , which is the list of 4 points representing the region of the image we want to obtain a top-down, “birds eye view” of.
We then load the image on Line 19 and convert the points to a NumPy array on Line 20.
Now before you get all upset at me for using the eval function, please remember, this is just an example. I don’t condone performing a perspective transform this way.
And, as you’ll see in next week’s post, I’ll show you how to automatically determine the four points needed for the perspective transform — no manual work on your part!
Next, we can apply our perspective transform on Line 24.
Finally, let’s display the original image and the warped, top-down view of the image on Lines 27-29.
Obtaining a Top-Down View of the Image
Alright, let’s see this code in action.
Open up a shell and execute the following command:
$ python transform_example.py --image images/example_01.png --coords "[(73, 239), (356, 117), (475, 265), (187, 443)]"
You should see a top-down view of the notecard, similar to below:
Let’s try another image:
$ python transform_example.py --image images/example_02.png --coords "[(101, 185), (393, 151), (479, 323), (187, 441)]"
And a third for good measure:
$ python transform_example.py --image images/example_03.png --coords "[(63, 242), (291, 110), (361, 252), (78, 386)]"
As you can see, we have successfully obtained a top-down, “birds eye view” of the notecard!
In some cases the notecard looks a little warped — this is because the angle the photo was taken at is quite severe. The closer we come to the 90-degree angle of “looking down” on the notecard, the better the results will be.
In this blog post I provided an OpenCV cv2.getPerspectiveTransform example using Python.
I even shared code from my personal library on how to do it!
But the fun doesn’t stop here.
You know those iPhone and Android “scanner” apps that let you snap a photo of a document and then have it “scanned” into your phone?
That’s right — I’ll show you how to use the 4 point OpenCV getPerspectiveTransform example code to build one of those document scanner apps!
I’m definitely excited about it, I hope you are too.
Anyway, be sure to signup for the PyImageSearch Newsletter to hear when the post goes live!