Building a Pokedex in Python: Comparing Shape Descriptors with OpenCV (Step 6 of 6)

Here we are, the final step of building a real-life Pokedex in Python and OpenCV.

This is where it all comes together.

We’ll glue all our pieces together and put together an image search engine based on shape features.

We explored what it takes to build a Pokedex using computer vision. Then we scraped the web and built up a database of Pokemon. We’ve indexed our database of Pokemon sprites using Zernike moments. We’ve analyzed query images and found our Game Boy screen using edge detection and contour finding techniques. And we’ve performed perspective warping and transformations using the cv2.warpPerspective function.

So here it is. The final step. It’s time to combine all these steps together into a working Pokedex.

Are you ready?

I am. This has been a great series of posts. And I’m ready to bring it all home.

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV and Python versions:
This example will run on Python 2.7 and OpenCV 2.4.X.

Previous Posts

This post is part of a series of blog posts on how to build a real-like Pokedex using Python, OpenCV, and computer vision and image processing techniques. If this is the first post in the series that you are reading, definitely take the time to digest it and understand what we are doing. But after you give it a read, be sure to go back to the previous posts. There is a ton of awesome content related to computer vision, image processing, and image search engines that you won’t want to miss!

Finally, if you have any questions, please send me an email. I love chatting with readers. And I’d be glad to answer any questions about computer vision that you may have.

Building a Pokedex in Python: Comparing Shape Descriptors

When we wrapped up our previous post, we had applied perspective warping and transformations to our Game Boy screen to obtain a top-down/birds-eye-view:

Figure 1: Performing a perspective transformation using Python and OpenCV on the Game Boy screen and cropping out the Pokemon.

Figure 1: Performing a perspective transformation using Python and OpenCV on the Game Boy screen and cropping out the Pokemon.

Then, we extracted the ROI (Region of Interest) that corresponded to where the Pokemon was in the screen.

Cropping the Pokemon from our Game Boy screen using Python and OpenCV.

Figure 2: Cropping the Pokemon from our Game Boy screen using Python and OpenCV.

From here, there are two things that need to happen.

The first is that we need to extract features from our cropped Pokemon (our “query image”) using Zernike moments.

Zernike moments are used to characterize the shape of an object in an image. You can read more about them here.

Secondly, once we have our shape features, we need to compare them to our database of shape features. In Step 2 of building a Pokedex, we extracted Zernike moments from our Pokemon sprite database. We’ll use the Euclidean distance between Zernike feature vectors to determine how “similar” two Pokemon sprites are.

Now that we have a plan, let’s define a Searcher class that will be used to compare a query image to our index of Pokemon sprites:

The first thing we do on Line 2 is import the SciPy distance package. This package contains a number of distance functions, but specifically, we’ll be using the Euclidean distance to compare feature vectors.

Line 4 defines our Searcher class, and Lines 5-7 defines the constructor. We’ll accept a single parameter, the index of our features. We’ll assume that our index is a standard Python dictionary with the name of the Pokemon as the key and the shape features (i.e. a list of numbers used to quantify the shape and outline of the Pokemon) as the value.

Line 9 then defines our search method. This method takes a single parameter — our query features. We’ll compare our query features to every value (feature vector) in the index.

We initialize our dictionary of results on Line 11. The name of the Pokemon will be the key and the distance between the feature vectors will serve as the value.

Finally, we can perform our comparison of Lines 14-18. We start by looping over our index, then we compute the Euclidean distance between the query features and the features in the index on Line 17. Finally, we update our results dictionary using the current Pokemon name as the key and the distance as the value.

We conclude our search method by sorting our results on Line 22, where smaller distances between feature vectors indicates that the images are more “similar”. We then return our results on Line 25.

In terms of actually comparing feature vectors, all the heavy lifting is down by our Searcher class. It takes an index of pre-computed features from a database and then compares the index to the query features. These results are then sorted by similarity and returned to the calling function.

Now that we have the Searcher class defined, let’s create, which will glue everything together:

Lines 2-8 handle importing all the packages we need. I placed our Searcher class in the pyimagesearch package for organization purposes. The same goes for our ZernikeMoments shape descriptor on Line 3 and imutils on Line 5. The imutils file simply contains convenience methods that make it easy to resize images. We then import NumPy to manipulate our arrays (since OpenCV treats images as multi-dimensional NumPy arrays), argparse to parse our command line arguments, cPickle load our pre-computed index of features, and cv2 to have our bindings into the OpenCV library.

Lines 11-16 parse our command line arguments. The --index switch is the path to our pre-computed index and --query is the path to our cropped query image, which is the output of Step 5.

Lines 19 and 20 simply use cPickle to load our pre-computed index of Zernike moments off of disk.

Now let’s load our query image off of disk and pre-process it:

This code is pretty self-explanatory, but let’s go over it none-the-less.

Line 24 loads our query image off disk using the cv2.imread function. We then convert our query image to grayscale on Line 25. Finally, we resize our image to have a width of 64 pixels on Line 26.

We now need to prepare our query image for our shape descriptor by thresholding it and finding contours:

The first step is to threshold our query image on Line 29. We’ll apply adaptive thresholding using the cv2.adaptiveThreshold function and set all pixels below the threshold to black (0) and all pixels above the threshold to white (255).

The output of our thresholding looks like this:

Figure 3: Applying local thresholding our query image using cv2.adaptiveThreshold.

Figure 3: Applying local thresholding our query image using cv2.adaptiveThreshold.

Next, we initialize a “blank” array of zeros on Line 35 with the same dimensions of our query image. This image will hold the outline/silhouette of our Pokemon.

A call to cv2.findContours on Line 36 finds all contours in our thresholded image. The cv2.findContours function is destructive to the image that you pass in, so be sure to make a copy of it using the NumPy copy() method.

Then, we make an important assumption on Line 38. We’ll assume that the contour with the largest area (calculated using the cv2.contourArea function) corresponds to the outline of our Pokemon.

This assumption is a reasonable one to make. Given that we have successfully cropped the Pokemon from our original image, it is certainly reasonable that the contour with the largest area corresponds to our Pokemon.

From there, we draw our largest contour using the cv2.drawContours function on Line 39.

You can see the output of drawing our contours below:

Figure 4: Drawing the largest contours using cv2.contourArea and cv2.drawContours.

Figure 4: Drawing the largest contours using cv2.contourArea and cv2.drawContours.

The rest of our code is pretty simple:

We initialize our ZernikeMoments shape descriptor on Line 43 with a radius of 21 pixels. This is the exact same descriptor with the exact same radius that we used when indexing our database of Pokemon sprites. Since our intention is to compare our Pokemon images for similarity, it wouldn’t make sense to use one descriptor for indexing and then another descriptor for comparison. It is important to obtain consistent feature representations of your images if your intent is to compare them for similarity.

Line 44 then extracts our Zernike moments from the outline/silhouette image seen in Figure 4 above.

To perform our actually search, we first initialize our Searcher on Line 47 and perform the search on Line 48.

Since our results are sorted in terms of similarity (with smaller Euclidean distances first), the first tuple in our list will contain our identification. We print out the name of our identified Pokemon on Line 49.

Finally, we display our query image and outline and wait for a keypress on Lines 52-54.

To execute our script, issue the following command:

When our script finishes executing you’ll see something similar to below:

Figure 5: The results of our identification. Sure enough, our Pokemon is a Marowak

Figure 5: The results of our identification. Sure enough, our Pokemon is a Marowak

Sure enough, our Pokemon is a Marowak.

Here are the results when using a Pidgey as a query image:

Figure 6: Identifying Pidgey with our Pokedex.

Figure 6: Identifying Pidgey with our Pokedex.

And a Kadabra:

Figure 7: Identifying Kadabra with our Pokedex.

Figure 7: Identifying Kadabra with our Pokedex.

You’ll notice that the Kabdra outline is not completely “filled in”. Luckily our Zernike moments shape features are robust enough to handle this. But this is likely a sign that we should take more care in pre-processing our images. I’ll leave that as future work for the reader.

Regardless, in all cases, our Pokemon image search engine is able to identify the Pokemon without an issue.

Who said a Pokedex was fiction?

Clearly by utilizing computer vision and image processing techniques we are able to build one in real-life!


In this post we wrapped up our series on building a Pokedex in Python and OpenCV.

We made use of a lot of important computer vision and image processing techniques such as grayscale conversion, thresholding, and finding contours.

We then used Zernike moments to describe the shape of our Pokemon.

In order to build an actual image search engine, we required a query image. We captured raw photos of our Game Boy screen and then applied perspective warping and transformations to obtain a top-down/birds-eye-view of our screen.

Finally, this post compared our shape descriptors using OpenCV and Python.

The end result is a real-life working Pokedex!

Simply point your smartphone at a Game Boy screen, snap a photo, and the Python scripts I have given you will take care of the rest!

Wrapping Up

I hope you enjoyed this series of blog posts as much as I have!

It takes a lot of my time to write up posts like this and I would really appreciate it if you took a moment to enter your email address in the form below so that I can keep in contact with you as I write more articles.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

17 Responses to Building a Pokedex in Python: Comparing Shape Descriptors with OpenCV (Step 6 of 6)

  1. auraham October 8, 2014 at 9:30 pm #

    nice tutorial! but the title doesn’t match with the pokedex post!

    • Adrian Rosebrock October 9, 2014 at 7:26 am #

      Hi Auraham, I actually had to change the title simply because the original was too long!

  2. Vincent April 12, 2016 at 10:02 am #

    Wow, your posts are by far the best applied material on computer vision I’ve seen

    • Adrian Rosebrock April 13, 2016 at 6:59 pm #

      Thanks Vincent!

  3. CJ April 13, 2016 at 1:33 am #

    Hi Adrian

    Thanks so much for your posts. I ran the code on some sample images of pokemon blue/red version battles (e.g. google “pokemon battle red” images), managed to get the pokemon outlines, but got really poor accuracy (about 20%). I noticed that the code wasn’t great at detecting larger Pokemon ‘blobs’ e.g. Arcanine, Blastoise, Exeggutor, Gyarados. What are some ways that I can make the code a lot more robust?

    • Adrian Rosebrock April 13, 2016 at 6:54 pm #

      The best way to improve this code is to adjust the radius of the Zernike moments. Right now the radius is hardcoded. To improve accuracy I would make this radius dynamic by computing the radius using the cv2.minEnclosingCircle function. This will ensure a consistent description of each Pokemon.

      • CJ April 17, 2016 at 4:09 am #

        Thanks Adrian! Going to look into that. Btw, might be a good idea to trigger an automated email whenever someone replies to a post on this blog. My 2cents

        • Adrian Rosebrock April 17, 2016 at 3:28 pm #

          Interesting, I thought the automated email reply was working! I’ll have to look into this.

  4. Vivek November 25, 2016 at 5:30 pm #

    Hi Adrian,
    I am trying to write a program to compare two images using perspective transformation. I am using two images – one with a straight arrow pointing upwards, and other a distorted version of this arrow on a red background. When I apply the perspective transform to the second image, the resultant image is a little shifted on the x axis. I am trying to find out why is this happening.. any ideas will be helpful.

    • Adrian Rosebrock November 28, 2016 at 10:37 am #

      Hey Vivek — can you share an example image of what you’re working with? I’m not sure I understand based on your explanation.

  5. siyer December 8, 2016 at 12:10 am #

    Hi Adrian,

    I tried this with the small and simplest set of images to check the results, however the searcher seems to match the wrong image each time. Admittedly, the difference in orientation and angle of the indexed images are not too high , but should have been ‘good enough’. Although it does fill the image properly , so the pre-processing looks ok. Something with the searcher distance calculation?

    Can i share with you the images and the index file for you to have a look?

    Also since new i am struggling to turn on the debugger on pyCharm editor while executing the script from command line.


    • Adrian Rosebrock December 10, 2016 at 7:31 am #

      What do you mean by “small and simplest set of images”? Are you referring to your own dataset? It’s hard to know what you mean without more details.

  6. siyer December 8, 2016 at 10:06 pm #

    Hi Adrian

    Posted couple of questions regarding the Zernlike moments and Searcher class not matching up the approp. image…Did you get those?


    • Adrian Rosebrock December 10, 2016 at 7:30 am #

      Yes, but please keep in mind that I offer my help, suggestions, and advice on the PyImageSearch Comments section for free. I am happy to offer my help and suggestions, but again, please keep in mind that I’m publishing these posts and tutorials free of charge for you to use.

  7. Rafael Ruiz Muñoz January 24, 2018 at 9:42 am #

    Dear Adrian,
    thank you very much for this tutorial.

    I was hoping it could be useful for me as I’m not looking exactly for the same feature.
    I have N **different** shapes, but I want to know which one are similar to others (so the query is not an trained shape), but I didn’t have much success. What’s the best approach you would follow?

    Thank you in advance.

    • Adrian Rosebrock January 24, 2018 at 4:50 pm #

      You would want to utilize image descriptors to quantify each of your shapes. I would suggest using Hu Moments and Zernike Moments. I cover both, including how to match shapes and pick out outliers, inside the PyImageSearch Gurus course.

      • Rafael Ruiz Muñoz January 25, 2018 at 5:49 am #

        Will give a try! thank you!

Leave a Reply