Building a Pokedex in Python: OpenCV and Perspective Warping (Step 5 of 6)

Figure 1: Performing a perspective transformation using Python and OpenCV on the Game Boy screen and cropping out the Pokemon.

Figure 1: Performing a perspective transformation using Python and OpenCV on the Game Boy screen and cropping out the Pokemon.

We’re getting closer to finishing up our real-life Pokedex!

In my previous blog post, I showed you how to find a Game Boy screen in an image using Python and OpenCV.

This post will show you how to apply warping transformations to obtain a “birds-eye-view” of the Game Boy screen. From there, we will be able to crop out the actual Pokemon and feed it into our Pokemon identification algorithm.

Looking for the source code to this post?
Jump right to the downloads section.

Previous Posts

This post is part of an on-going series of blog posts on how to build a real-life Pokedex using Python, OpenCV, and computer vision and image processing techniques. If this is the first post in the series you are reading, definitely check it out! But after you give it a read, be sure to go back and review the previous posts — there is a TON of awesome computer vision and image processing content in there.

Finally, if you have have any questions, feel free to shoot me an email. I would be happy to chat.

Building a Pokedex in Python: OpenCV Perspective Transform Example

When we wrapped up the previous post on building a Pokedex in Python, we were able to find our Game Boy screen by applying edge detection, finding contours, and then approximating the contours, like this:

Finding a Game Boy screen in an image using Python and OpenCV.

Figure 2: Finding a Game Boy screen in an image using Python and OpenCV.

However, you may notice that the Game Boy screen is slightly skewed — the screen is definitely leaning to the right.

The perspective of the screen is also wrong. Ideally, we would want to have a top-down, birds-eye-view of the Game Boy screen, as in Figure 1.

How are we going to accomplish this?

Let’s jump into some code.

We’ll be building off the code in the previous post, so if it looks like we are jumping into the middle of a file, it’s because we are.

On Line 53 we are are reshaping the contour that corresponds to the outline of the screen. The contour has four points, the four points of the rectangular region of the screen. We are simply reshaping the NumPy array of points to make them easier to work with.

In order to apply a perspective transformation, we need to know the top-left, top-right, bottom-right, and bottom-left corners of the contour. However, just because we have the contour that corresponds to the Game Boy screen, we have no guarantee of the order of the points. There is no guarantee that the top-left point is the first point in the contour list. It might be the second point. Or the fourth point.

To handle this problem we’ll have to impose a strict order on the points. We start on Line 54 by initializing our rectangle of shape (4, 2) to store the ordered points.

Line 58-60 handles grabbing the top-left and bottom-right points. Line 58 handles summing the (x, y) coordinates together by specifying axis=1. The top-left point will have the smallest sum (Line 59), whereas the bottom-right point will have the largest sum (Line 60).

Now we need to grab the top-right and bottom-left points on Line 65-67 by taking the difference between the (x, y) coordinates. The top-right point will have the smallest difference (Line 66), whereas the bottom-left point will have the largest difference (Line 67).

Notice how our points are now stored in an imposed order: top-left, top-right, bottom-right, and bottom-left. Keeping a consistent order is important when we apply our perspective transformation.

If you remember back to the previous post, we resized our image to make image processing and edge detection faster and more accurate. We kept track of this resizing ratio for a good reason — when we crop out of Game Boy screen, we want to crop out the original Game Boy screen, not the smaller, resized one.

In oder to extract the original, large Game Boy screen, we multiply our rect by the ratio, thus transforming the points to the original image size.

Next, we need to calculate the size of the Game Boy screen so that we can allocate memory to store it:

Let’s take this code apart and see what’s going on:

  • Line 74: Here we are unpacking our rect and grabbing the top-left, top-right, bottom-right, and bottom-left points, respectively.
  • Line 75: In order to determine the width of the image, we compute the distance between the x coordinates of the bottom-right and bottom-left points.
  • Line 76: Similarly, we compute the distance between the x coordinates of the top-right and top-left points.
  • Lines 79 and 80: Just like we computed the distance between the coordinate points, we now need to do the same for the y coordinate points.
  • Lines 84 and 85: Now that we have our distances, we take the maximum of widthA and widthB to determine the width of our transformed image. We then repeat the process for heightA and heightB to determine the dimensions of the new image.
  • Lines 89-93: Remember how I said the order of the points is important? In order to compute the birds-eye-view of the Game Boy screen we need to construct a matrix dst to handle the mapping. The first entry in dst is the origin of the image — the top-left corner. We then specify the top-right, bottom-right, and bottom-left points based on our calculated width and height.
  • Line 97: To compute the perspective transformation, we need the actual transformation matrix.  This matrix is calculated by making a call to cv2.getPerspective transformation and passing in the coordinates of the Game Boy screen in the original image, followed by the four points we specified for our output image. In return, we are given our transformation matrix M.
  • Line 98: Finally, we can apply our transformation by calling the cv2.warpPerspective function. The first parameter is our original image that we want to warp, the second is our transformation matrix M obtained from cv2.getPerspective, and the final parameter is a tuple, used to indicate the width and height of the output image.

If all goes well, we should now have a top-down/birds-eye-view of our Game Boy screen:

Obtaining a top-down/birds-eye-view of an image using Python, OpenCV, and perspective warping and transformations.

Figure 2: Obtaining a top-down/birds-eye-view of an image using Python, OpenCV, and perspective warping and transformations.

But we aren’t done yet!

We still need to crop out the actual Pokemon from the top-right portion of the screen.

Furthermore, you’ll notice that our Marowak seems to be a bit “shadowy” and the screen of the Game Boy itself is darker than we would like it to be. We need to see if we can re-scale the intensity of our image to help mitigate this shadow and make it easier to extract the contour of the Marowak, later allowing us to compute shape features over the Pokemon outline.

The first thing we’ll do is convert our warped image to grayscale on Line 103. Then, we make use of the skimage Python library. We make a call to the rescale_intensity method in the exposure sub-package. This method takes our warped image and then re-scales the gray pixel intensities by finding the minimum and maximum values. The minimum value then becomes black (a value of 0) and the maximum value then becomes white (a value of 255). All pixels that fall into that range are scaled accordingly.

The output of this re-scaling can be seen below:

Re-scaling the intensity of pixels using scikit-image.

Figure 3: Re-scaling the intensity of pixels using scikit-image.

Notice how that shadow region is much less apparent.

From here, all we need is some simple cropping.

We grab the height and width of the warped Game Boy screen on Line 108 and then determine a region that is 40% of the width and 45% of the height on Line 109 — the Pokemon that we want to identify will lie within this region of the image:

Figure 4: Cropping the Pokemon from our Game Boy screen using Python and OpenCV.

Figure 4: Cropping the Pokemon from our Game Boy screen using Python and OpenCV.

Note: I determined these percentages empirically by trial and error. There is no fancy computer vision magic going on. Just your standard testing and debugging to find the correct percentages.

We crop the Pokemon from the Game Boy screen on Line 110  and write it to file on Line 113. In the next (and final) blog post in this series we’ll use this cropped image to perform the actual identification of the Pokemon.

Finally, Lines 116-120 just show us the results of our labor:

To execute our script to the Pokemon in the Game Boy screen, simply execute the following command:


In this blog post we applied perspective and warping transformations using Python and OpenCV. We utilized the cv2.getPerspectiveTransform and cv2.warpPerspective functions to accomplish these transformations. We then reviewed a perspective transform OpenCV example.

We applied these techniques to obtain a top-down/birds-eye-view of our Game Boy screen, allowing us to crop out the Pokemon we want to identify. This example demonstrated the OpenCV perspective transform.

Finally, we used scikit-image to rescale the pixel intensity of the grayscale cropped image.

My next post will wrap up this series of post and tie everything together. We will take our cropped Pokemon and then run it through our identification algorithm.

From there, we’ll have a real-life working Pokedex!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

16 Responses to Building a Pokedex in Python: OpenCV and Perspective Warping (Step 5 of 6)

  1. jonatslim January 21, 2015 at 11:59 pm #

    Where is guide 6-of-6? This feels like watching a movie when the climax of the story comes, the power goes out ! :d

  2. Niall M ODowd March 31, 2016 at 6:18 pm #

    Hi Adrian,

    Your sample code, awesome explanation, and annotation have helped me create a live transforming script that basically finds 4 corners on a piece of paper in the outside world and remaps the points to a perfect square using a webcam.

    The transform matrix is used to transform the whole webcam image and display the image as if the webcam was normal to the surface of the square.

    My current dilemma is accuracy. it seems that with all of the subpix and goodfeaturetotrack parameter fiddling, I simply cannot get a corner list that does not bounce around. though the shifting of the corners is slight, the transformation matrices vary a lot.

    Is there a way to improve accuracy? (maybe use the sidelines of the square to boost orientation accuracy?) I have spent a ton of time trying to improve the shifting, but I just need more information from the webcame frame.


    • Adrian Rosebrock April 1, 2016 at 3:19 pm #

      Your project sounds super awesome. Do you mind sending me an email containing the types of images you’re working with? That might help me point you in the right direction. I’m not entirely sure I understand what you mean by the corner list “bouncing around”.

    • Simon O'Brien April 4, 2018 at 11:44 am #

      Hey man,

      Can you give me some pointers on how you achieved this? Thanks

  3. Atmadeep arya February 4, 2018 at 11:10 am #

    Hi Adrian,
    I have followed your amazing work for quite a long, Thanks for doing it.
    Can you help me with one doubt?

    pts = screen_cnt.reshape(4,2)

    This line throws an error on python 2.7 and OpenCV 3.1.x. The error is :
    pts = screen.reshape(4,2)
    ValueError: cannot reshape array of size 328 into shape (4,2)

    The screen contour has 328 points, I have regenerated this error using other examples.
    Q1. How do I only get 4 points, Is there any other way?
    Q2. I’m trying to use a minimum area rectangle, but how do I determine points?

    I need help in generating a cloud point using stereo vision. Can you help me on that?
    Hoping you keep doing this amazing work,
    Atmadeep Arya.

    • Adrian Rosebrock February 6, 2018 at 10:28 am #

      You have two choices here:

      1. Apply the contour approximation and then assume that the contour has 4 points.

      2. A better option may be to compute the bounding box of the contour before you can reshape the array. Take a look at this blog post for more information.

  4. Fadwa March 6, 2018 at 6:54 am #

    What would happen if i applied the M transformation on the whole image not the croped screen?.

    • Adrian Rosebrock March 7, 2018 at 9:13 am #

      Try it 🙂

  5. Prasad February 8, 2019 at 3:01 am #

    I have read a lot of tutorials. I *strongly* believe it will be helpful if you could post vanilla OpenCV code as opposed to your own modules etc. If I have to use code in production, it will be difficult for me. Your modules break quite frequently too (imutil for example, is failing for some reason) and I cannot be sure if those can be used in commercial applications.

    But, thanks for all the efforts in publishing articles like these.

    • Adrian Rosebrock February 14, 2019 at 2:05 pm #

      The “imutils” library is actually “vanilla OpenCV”. It’s just OpenCV code under the hood. You can see for yourself on the GitHub repo. As far as your errors go I’d be happy to help but without knowing the exact error or what you are running into I cannot provide any help or suggestions.

  6. Alex P. August 28, 2019 at 11:13 am #

    Dear Adrian,

    I am trying to rotate an circular, already cropped (by a rectangular box) image of a circle. The caveat is that this circle is seen from an angle (think of your example with the medical pills, where instead of having a ‘front-face’ image of a pill, you have a foto of it taken slightly from an angle to the left, say.

    I am struggling on where to start to explore how to deal with my problem, I am not sure how I should choose my reference points. I tried running your scripts (four_point_transform) and it ran, but it did not give me the desired result.

    Would be great if you could just give me a nudge to the right direction.

    Thank you very much

    Alex P

    • Alex P. August 28, 2019 at 11:24 am #

      p.s I am trying to “rotate it” or “warp it” (not sure what the correct term is) in such a way that the image looks as if I would have taken it standing right in-front of the circle

      Hope that clarification makes some sense!

  7. Jason Y. December 8, 2019 at 8:45 am #

    Hi Adrian,
    I have a question about the function of cv2.warpPerspective. The first parameter that we give is the image but why this function output is the region of this image rather all? Because of M?


  1. Comparing Shape Descriptors for Similarity using Python and OpenCV - May 30, 2014

    […] We explored what it takes to build a Pokedex using computer vision. Then we scraped the web and built up a database of Pokemon. We’ve indexed our database of Pokemon sprites using Zernike moments. We’ve analyzed query images and found our Game Boy screen using edge detection and contour finding techniques. And we’ve performed perspective warping and transformations using the cv2.warpPerspective function. […]

  2. 4 Point OpenCV getPerspective Transform Example - PyImageSearch - August 25, 2014

    […] You may remember back to my posts on building a real-life Pokedex, specifically,  my post on OpenCV and Perspective Warping. […]

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply