Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors (Step 3 of 6)

Using shape descriptors to quantify an object is a lot like playing Who's that Pokemon as a kid.

Using shape descriptors to quantify an object is a lot like playing Who’s that Pokemon as a kid.

So, how is our Pokedex going to “know” what Pokemon is in an image? How are we going to describe each Pokemon? Are we going to characterize the color of the Pokemon? The texture? Or the shape?

Well, do you remember playing Who’s that Pokemon as a kid?

You were able to identify the Pokemon based only on its outline and silhouette.

We are going to apply the same principles in this post and quantify the outline of Pokemon using shape descriptors.

Looking for the source code to this post?
Jump right to the downloads section.

You might already be familiar with some shape descriptors, such as Hu moments. Today I am going to introduce you to a more powerful shape descriptor — Zernike moments, based on Zernike polynomials that are orthogonal to the unit disk.

Sound complicated?

Trust me, it’s really not. With just a few lines of code I’ll show you how to compute Zernike moments with ease.

Previous Posts

This post is part of an on-going series of blog posts on how to build a real-life Pokedex using Python, OpenCV, and computer vision and image processing techniques. If this is the first post in the series that you are reading, go ahead and read through it (there is a lot of awesome content in here on how to utilize shape descriptors), but then go back to the previous posts for some added context.

Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors

Figure 1: Our database of Pokemon Red, Blue, and Green sprites.

Figure 1: Our database of Pokemon Red, Blue, and Green sprites.

At this point, we already have our database of Pokemon sprite images. We gathered, scraped, and downloaded our sprites, but now we need to quantify them in terms of their outline (i.e. their shape).

Remember playing “Who’s that Pokemon?” as a kid? That’s essentially what our shape descriptors will be doing for us.

For those who didn’t watch Pokemon (or maybe need their memory jogged), the image at the top of this post is a screenshot from the Pokemon TV show. Before going to commercial break, a screen such as this one would pop up with the outline of the Pokemon. The goal was to guess the name of the Pokemon based on the outline alone.

This is essentially what our Pokedex will be doing — playing Who’s that Pokemon, but in an automated fashion. And with computer vision and image processing techniques.

Zernike Moments

Before diving into a lot of code, let’s first have a quick review of Zernike moments.

Image moments are used to describe objects in an image. Using image moments you can calculate values such as the area of the object, the centroid (the center of the object, in terms of x, y coordinates), and information regarding how the object is rotated. Normally, we calculate image moments based on the contour or outline of an image, but this is not a requirement.

OpenCV provides the HuMoments function which can be used to characterize the structure and shape of an object. However, a more powerful shape descriptors can be found in the mahotas package — zernike_moments. Similar to Hu moments, Zernike moments are used to describe the shape of an object; however, since the Zernike polynomials are orthogonal to each other, there is no redundancy of information between the moments.

One caveat to look out for when utilizing Zernike moments for shape description is the scaling and translation of the object in the image. Depending on where the image is translated in the image, your Zernike moments will be drastically different. Similarly, depending on how large or small (i.e. how your object is scaled) in the image, your Zernike moments will not be identical. However, the magnitudes of the Zernike moments are independent of the rotation of the object, which is an extremely nice property when working with shape descriptors.

In order to avoid descriptors with different values based on the translation and scaling of the image, we normally first perform segmentation. That is, we segment the foreground (the object in the image we are interested in) from the background (the “noise”, or the part of the image we do not want to describe). Once we have the segmentation, we can form a tight bounding box around the object and crop it out, obtaining translation invariance.

Finally, we can resize the object to a constant NxM pixels, obtaining scale invariance.

From there, it is straightforward to apply Zernike moments to characterize the shape of the object.

As we will see later in this series of blog posts, I will be utilizing scaling and translation invariance prior to applying Zernike moments.

The Zernike Descriptor

Alright, enough overview. Let’s get our hands dirty and write some code.

As you may know from the Hobbits and Histograms post, I tend to like to define my image descriptors as classes rather than functions. The reason for this is that you rarely ever extract features from a single image alone. Instead, you extract features from a dataset of images. And you are likely utilizing the exact same parameters for the descriptors from image to image.

For example, it wouldn’t make sense to extract a grayscale histogram with 32 bins from image #1 and then a grayscale histogram with 16 bins from image #2, if your intent is to compare them. Instead, you utilize identical parameters to ensure you have a consistent representation across your entire dataset.

That said, let’s take this code apart:

  • Line 2: Here we are importing the mahotas package which contains many useful image processing functions. This package also contains the implementation of our Zernike moments.
  • Line 4: Let’s define a class for our descriptor. We’ll call it ZernikeMoments.
  • Lines 5-8: We need a constructor for our ZernikeMoments class. It will take only a single parameter — the radius of the polynomial in pixels. The larger the radius, the more pixels will be included in the computation. This is an important parameter and you’ll likely have to tune it and play around with it to obtain adequately performing results if you use Zernike moments outside this series of blog posts.
  • Lines 10-12: Here we define the describe method, which quantifies our image. This method requires an image to be described, and then calls the mahotas implementation of zernike_moments to compute the moments with the specified radius, supplied in Line 5.

Overall, this isn’t much code. It’s mostly just a wrapper around the mahotas implementation of zernike_moments. But as I said, I like to define my descriptors as classes rather than functions to ensure the consistent use of parameters.

Next up, we’ll index our dataset by quantifying each and every Pokemon sprite in terms of shape.

Indexing Our Pokemon Sprites

Now that we have our shape descriptor defined, we need to apply it to every Pokemon sprite in our database. This is a fairly straightforward process so I’ll let the code do most of the explaining. Let’s open up our favorite editor, create a file named index.py, and get to work:

Lines 2-8 handle importing the packages we will need. I put our ZernikeMoments class in the pyimagesearch sub-module for organizational sake. We will make use of numpy when constructing multi-dimensional arrays, argparse for parsing command line arguments, pickle for writing our index to file, glob for grabbing the paths to our sprite images, and cv2 for our OpenCV functions.

Then, Lines 11-16 parse our command line arguments. The --sprites switch is the path to our directory of scraped Pokemon sprites and --index points to where our index file will be stored.

Line 21 handles initializing our ZernikeMoments descriptor. We will be using a radius of 21 pixels. I determined the value of 21 pixels after a few experimentations and determining which radius obtained the best performing results.

Finally, we initialize our index on Line 22. Our index is a built-in Python dictionary, where the key is the filename of the Pokemon sprite and the value is the calculated Zernike moments. All filenames are unique in this case so a dictionary is a good choice due to its simplicity.

Time to quantify our Pokemon sprites:

Now we are ready to extract Zernike moments from our dataset. Let’s take this code apart and make sure we understand what is going on:

  • Line 25: We use glob to grab the paths to our all Pokemon sprite images. All our sprites have a file extension of .png. If you’ve never used glob before, it’s an extremely easy way to grab the paths to a set of images with common filenames or extensions. Now that we have the paths to the images, we loop over them one-by-one.
  • Line 28: The first thing we need to do is extract the name of the Pokemon from the filename. This will serve as our unqiue key into the index dictionary.
  • Line 29 and 30: This code is pretty self-explanatory. We load the current image off of disk and convert it to grayscale.
  • Line 35 and 36: Personally, I find the name of the copyMakeBorder function to be quite confusing. The name itself doesn’t really describe what it does. Essentially, the copyMakeBorder “pads” the image along the north, south, east, and west directions of the image. The first parameter we pass in is the Pokemon sprite. Then, we pad this image in all directions by 15 white (255) pixels. This step isn’t necessarily required, but it gives you a better sense of the thresholding on Line 39.
  • Line 39 and 40: As I’ve mentioned, we need the outline (or mask) of the Pokemon image prior to applying Zernike moments. In order to find the outline, we need to apply segmentation, discarding the background (white) pixels of the image and focusing only on the Pokemon itself. This is actually quite simply — all we need to do is flip the values of the pixels (black pixels are turned to white, and white pixels to black). Then, any pixel with a value greater than zero (black) is set to 255 (white).

Take a look at our thresholded image below:

Figure 2: Our Abra sprite is pictured at the top and the thresholded image on the bottom.

Figure 2: Our Abra sprite is pictured on the top and the thresholded image on the bottom.

This process has given us the mask of our Pokemon. Now we need the outermost contours of the mask — the actual outline of the Pokemon.

First, we need a blank image to store our outlines — we appropriately a variable called outline on Line 45 and fill it with zeros with the same width and height as our sprite image.

Then, we make a call to cv2.findContours on Lines 46 and 47. The first argument we pass in is our thresholded image, followed by a flag cv2.RETR_EXTERNAL telling OpenCV to find only the outermost contours. Finally, we tell OpenCV to compress and approximate the contours to save memory using the cv2.CHAIN_APPROX_SIMPLE flag.

Line 48 handles parsing the contours for various versions of OpenCV.

As I mentioned, we are only interested in the largest contour, which corresponds to the outline of the Pokemon. So, on Line 49 we sort the contours based on their area, in descending order. We keep only the largest contour and discard the others.

Finally, we draw the contour on our outline image using the cv2.drawContours function. The outline is drawn as a filled in mask with white pixels:

Figure 3: Outline of our Abra. We will be using this image to compute our Zernike moments.

Figure 3: Outline of our Abra. We will be using this image to compute our Zernike moments.

We will be using this outline image to compute our Zernike moments.

Computing Zernike moments for the outline is actually quite easy:

On Line 54 we make a call to our describe method in the ZernikeMoments class. All we need to do is pass in the outline of the image and the describe method takes care of the rest. In return, we are given the Zernike moments used to characterize and quantify the shape of the Pokemon.

So how are we quantifying and representing the shape of the Pokemon?

Let’s investigate:

Here we can see that our feature vector is of 25-dimensionality (meaning that there are 25 values in our list). These 25 values represent the contour of the Pokemon.

We can view the values of the Zernike moments feature vector like this:

So there you have it! The Pokemon outline is now quantified using only 25 floating point values! Using these 25 numbers we will be able to disambiguate between all of the original 151 Pokemon.

Finally on Line 55, we update our index with the name of the Pokemon as the key and our computed features as our value.

The last thing we need to do is dump our index to file so we can use when we perform a search:

To execute our script to index all our Pokemon sprites, issue the following command:

Once the script finishes executing all of our Pokemon will be quantified in terms of shape.

Later in this series of blog posts, I’ll show you how to automatically extract a Pokemon from a Game Boy screen and then compare it to our index.

Summary

In this blog post, we explored Zernike moments and how they can be used to describe and quantify the shape of an object.

In this case, we used Zernike moments to quantify the outline of the original 151 Pokemon. The easiest way to think of this is playing “Who’s that Pokemon?” as a kid. You are given the outline of the Pokemon and then you have to guess what the Pokemon is, using only the outline alone. We are doing the same thing — only we are doing it automatically.

This process of describing an quantifying a set of images is called “indexing”.

Now that we have our Pokemon quantified, I’ll show you how to search and identify Pokemon later in this series of posts.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , , ,

29 Responses to Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors (Step 3 of 6)

  1. Hamid August 31, 2015 at 3:14 am #

    Dear Adrian,

    I need to appreciate your kind sharing and wise approaches here. Actually, I am doing my PhD and on the way about one of my mini projects need to reconrtuct the images from Zernike moments extracted from Mahotas toolbox that you introduced. There is one code I found on the web but relate to matlab and am not very sure of that. I wonder if you could kindly give me an advice . Thank you very much.

    Best regards.

    • Adrian Rosebrock August 31, 2015 at 7:01 am #

      Congrats on doing your PhD Hamid, that’s very exciting! As for Zernike Moments, I would suggest looking at the source code directly of the Mahotas implementation. You’ll probably need to modify the code to do the reconstruction. I would also suggest sending a message on GitHub to Luis, the developer and maintainer of Mahtoas — he is an awesome guy and knows a lot about CV.

  2. Rishit Bansal February 5, 2016 at 1:27 pm #

    So even if the image of the Pokemon to be determined is laterally inverted, will the zernike moments search still find it as the normal image in the index?

    • Adrian Rosebrock February 6, 2016 at 9:57 am #

      Correct, Zernike moments are invariant under rotation.

  3. siyer December 8, 2016 at 2:52 am #

    Hi Adrian

    While calculating the zernlike moments how to determine or end up using the the right radius value for the specific set of images ?

    Could you elaborate on this item of your post pls

    Lines 5-8: We need a constructor for our ZernikeMoments class. It will take only a single parameter — the radius of the polynomial in pixels. The larger the radius, the more pixels will be included in the computation. This is an important parameter and you’ll likely have to tune it and play around with it to obtain adequately performing results if you use Zernike moments outside this series of blog posts.

    • Adrian Rosebrock December 10, 2016 at 7:27 am #

      The easiest way to do this is to compute the cv2.minEnclosingCircle of the contour. This will give you a radius that encapsulates the entire object. Or, if you have a priori knowledge about the problem you can hardcode it. I discuss this more inside the PyImageSearch Gurus course.

  4. Aven Kidur February 8, 2017 at 5:02 pm #

    I just wanted to say your posts are so well written. Even though I learned something I didn’t know, I did not feel lost or confused for a moment. Instead of aversion masquerading as boredom, it is *exciting* and *fun* to read the next line.

    You may be naturally gifted, but I suspect you’ve put great energy and care into crafting these lessons.

    Thank you

    • Adrian Rosebrock February 10, 2017 at 2:06 pm #

      Thank you for the kind words Aven, I really appreciate that 🙂 Comments like these are a real pleasure to read and make my day.

  5. Oscar February 21, 2017 at 11:23 am #

    Very nice tutorial Adrian!

    One question, can you index multiple images for 1 pokemon?
    If so, I can index the images of every generation (gold/silver,black/white,…) and my pokedex will become smarter. This way it might also be able to process a random image of a Pokemon and still be accurate.

    Or am I wrong?

    Thanks in advance!
    Oscar

    • Adrian Rosebrock February 22, 2017 at 1:34 pm #

      Hi Oscar — you can certainly generate as many indexes as you want.

  6. Babak Abad May 19, 2017 at 5:55 pm #

    Thanks a a lot. I was searching for Zernik moments. I found no thing except one web site providing MATLAB codes which was unclear to me. You explain all things practically, specially where results of codes are provided. I appreciate you dear Adrian.

    • Adrian Rosebrock May 21, 2017 at 5:18 am #

      Thank you, I’m really happy to hear that I could help 🙂

  7. kaias June 8, 2017 at 4:36 pm #

    Dear Adrian,

    I am getting the following problem when I run the code for this lesson:

    $ python index.py –sprites sprites –index index.cpickle
    Traceback (most recent call last):
    File “index.py”, line 48, in
    (cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    ValueError: too many values to unpack

    Please suggest solution.

    Thanks

    • kaias June 8, 2017 at 4:51 pm #

      Dear Adrian,

      Changing the 45th line to the following did it for me.

      _,cnts,_ = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

      Could you explain what the content of cnts might be? Why is the sorted function required if cv2.RETR_EXTERNAL flag is used? For our case, the image contains only one pokemon sprite. Should it give more than one external contour?
      Thanks for the tutorial.

      • Adrian Rosebrock June 9, 2017 at 1:38 pm #

        This blog post assumed you are using OpenCV 2.4; however, you are using OpenCV 3, where the cv2.findContours return signature changed. You can read more about this change here. As for sorting the contours, sometimes there is “noise” in our input image. Sorting just ensures we grab the largest region.

  8. mohamed amin houidi December 12, 2017 at 7:24 am #

    can i use zernike moments to identify hand signs ( like stop, go , follow etc) or is there a more convenient way to do it??

    • Adrian Rosebrock December 12, 2017 at 8:58 am #

      You could, but accuracy wouldn’t be as good as object detection. For objects that do not rotate (or have little rotation) take a look at HOG + Linear SVM.

  9. Daniel December 30, 2017 at 1:45 pm #

    Hi, Adrian.

    All your post are amazing. Thank you very much for your work. I am really learning a lot here. I really love AI, speech recognition and CV projects, due to your blog I am seriously thinking in starting my graduate studies in these areas.

    I would like to know, is there any way to obtain an image from its Zernike Moments… I mean, with this piece of code

    moments = desc.describe(outline)

    we are able to get the Zernike Moments, is there any way to make the inverse process??? Are Zernike moments enough to get back the outline of the image, or we should use some any other kind of feature vectors.

    Thanks, again

    • Adrian Rosebrock December 31, 2017 at 9:36 am #

      When you say “obtain an image from its Zernike Moments” are you referring to reconstructing the image based on Zernike Moments? Is there a particular reason you are doing the reconstruction?

  10. Prakash February 9, 2018 at 9:00 pm #

    if you are python 3,

    pls import cPickle this way,

    import _pickle as cPickle

  11. Claudia April 23, 2018 at 6:54 am #

    Hi Adrian!

    I’m really enjoying your posts. Everything is very well explained!

    Regarding this post, I just wanted to mention that apparently in Python 3 the dumps() function has varied a little bit:

    Python 2 : Return the pickled representation of the object as a string (https://docs.python.org/2/library/pickle.html)

    Python 3: Return the pickled representation of the object as a bytes object (https://docs.python.org/3.2/library/pickle.html)

    Thus, we only need to do this modification, in order to obtain a binary file, with the binary representation:

    f = open(args[“index”], “wb”)
    cPickle.dump(index,f)
    f.close()

    • Adrian Rosebrock April 25, 2018 at 6:00 am #

      Thanks Claudia. I’ll also mention that if you’re using Python 3 you should just be using the “pickle” library rather than “cPickle”.

  12. Anh June 16, 2018 at 7:22 pm #

    And this is the error showing on my terminal


    f.write(cPickle.dump(index, “wb”))
    TypeError: file must have a ‘write’ attribute

    • Anh June 16, 2018 at 7:58 pm #

      I found a fix to write the index into a file

      with open(args[“index”], ‘wb’) as pickle_file:
      cPickle.dump(index, pickle_file)

  13. Chengyuan Yang December 23, 2018 at 9:39 am #

    Dear Adrian! Thanks for your post! I noticed that instead of creating a tight bounding box and resizing images to a fixed size, you just created binary masks with the same shape to original images. My question is that will Zernike Moment of a pokemon image be better or worse if I tightly crop and resize it to a fixed size?

  14. Kranthi February 21, 2019 at 1:33 am #

    Great post. Thanks for sharing.

Trackbacks/Pingbacks

  1. Building a Pokedex in Python: Finding the Game Boy Screen (Step 4 of 6) - PyImageSearch - April 23, 2014

    […] Step 3: Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors (Step 3 of 6) […]

  2. Python and OpenCV Example: Warp Perspective and Transform - May 5, 2014

    […] Step 3: Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors (Step 3 of 6) […]

  3. Comparing Shape Descriptors for Similarity using Python and OpenCV - May 19, 2014

    […] the web and built up a database of Pokemon. We’ve indexed our database of Pokemon sprites using Zernike moments. We’ve analyzed query images and found our Game Boy screen using edge detection and contour […]

Leave a Reply

[email]
[email]