Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6)

Figure 1: Our database of Pokemon Red, Blue, and Green sprites.

Figure 1: Our database of Pokemon Red, Blue, and Green sprites.

What if we could build a real life Pokedex?

You know, just like Ash Ketchum — point your Pokedex at a Pokemon (or in this case, snap a photo of a Pokemon), identify it, and get its stats.

While this idea has its roots in the Pokemon TV show, I’m going to show you how to make it a reality.

Looking for the source code to this post?
Jump right to the downloads section.

Previous Posts:

Before we get too far into detail, here are some previous posts you can look over for context and more detail on building our Pokedex:

Step 2: Scraping our Pokemon Database

Prior to even starting to build our Pokemon search engine, we first need to gather the data. And this post is dedicated to exactly that — scraping and building our Pokemon database. I’ve structured this post to be a Python web scraping tutorial; by the time you have finished reading this post, you’ll be scraping the web with Python like a pro.

Our Data Source

I ended up deciding to scrape Pokemon DB because they have the some of the highest quality sprites that are easily accessible. And their HTML is nicely formatted and made it easy to download the Pokemon sprite images.

However, I cheated a little bit and copied and pasted the relevant portion of the webpage into a plaintext file. Here is a sample of some of the HTML:

You can download the full HTML file using the form at the bottom of this post.

Scraping and Downloading

Now that we have our raw HTML, we need to parse it and download the sprite for each Pokemon.

I’m a big fan of lots of examples, lots of code, so let’s jump right in and figure out how we are going to do this:

Lines 2-4 handle importing the packages we will be using. We’ll use BeautifulSoup to parse our HTML and requests to download the Pokemon images. Finally, argparse is used to parse our command line arguments.

To install Beautiful soup, simply use pip:

Then, on Lines 7-12 we parse our command line arguments. The switch --pokemon-list is the path to our HTML file that we are going to parse, while --sprites is the path to the directory where our Pokemon sprites will be downloaded and stored.

Now, let’s extract the Pokemon names from the HTML file:

On Line 16 we use BeautifulSoup to parse our HTML — we simply load our HTML file off disk and then pass it into the constructor. BeautifulSoup takes care of the rest. Line 17 then initializes the list to store our Pokemon names.

Then, we start to loop over all link elements on Line 20. The href attributes of these links point to a specific Pokemon. However, we do not need to follow each link. Instead, we just grab the inner text of the element. This text contains the name of our Pokemon.

Now that we have a list of Pokemon names, we need to loop over them (Line 25) and format the name correctly so we can download the file. Ultimately, the formatted and sanitized name will be used in a URL to download the sprite.

Let’s examine each of these steps:

  • Line 28: The first step to sanitizing the Pokemon name is to convert it to lowercase.
  • Line 32: The first special case we need to handle is removing the apostrophe character. The apostrophe occurs in the name “Farfetch’d”.
  • Line 37: Then, we need to replace the occurrence of a period and space. This happens in the name “Mr. Mime”. Notice the “. ” in the middle of the name. This needs to be removed.
  • Lines 40-45: Now, we need to handle unicode characters that occur in the Nidoran family. The symbols for “male” and “female” are used in the actual game, but in order to download the sprite for the Nidorans, we need to manually construct the filename.

Now, we can finally download the Pokemon sprite:

Line 49 constructs the URL of the Pokemon sprite. The base of the URL is http://img.pokemondb.net/sprites/red-blue/normal/ — we finish building the URL by appending the name of the Pokemon plus the “.png” file extension.

Downloading the actual image is handled on a single line (Line 50) using the requests package.

Lines 53-55 check the status code of the request. If the status code is not 200, indicating that the download was not successful, then we handle the error and continue looping over the Pokemon names.

Finally Lines 58-60 saves the sprite to file.

Running Our Scrape

Now that our code is complete, we can execute our scrape by issuing the following command:

This script assumes that the file that containing the Pokemon HTML is stored in pokemon_list.html and the downloaded Pokemon sprites will be stored in the sprites directory.

After the script has finished running, you should have a directory full of Pokemon sprites:

Figure 1: After parse_and_download.py has finished running, you should have a directory filled with Pokemon sprites, like this.

Figure 1: After parse_and_download.py has finished running, you should have a directory filled with Pokemon sprites, like this.

It’s that simple! Just a little bit of code and some knowledge on how to scrape images, we can build a Python script to scrape Pokemon sprites in under 75 lines of code.

Note: After I wrote this blog post, thegatekeeper07 suggested using the Veekun Pokemon Database. Using this database allows you to skip the scraping step and you can download a tarball of the Pokemon sprites. If you decide to take this approach, this is a great option; however, you might have to modify my source code a little bit to use the Veekun database. Just something to keep in mind!

Summary

This post served as a Python web scraping tutorial: we downloaded sprite images for the original 151 Pokemon from the Red, Blue, and Green versions.

We made use of the BeautifulSoup and requests packages to download our Pokemon. These packages are essential to making scraping easy and simple, and keeping headaches to a minimum.

Now that we have our database of Pokemon, we can index them and characterize their shape using shape descriptors. We’ll cover that in the next blog post.

If you would like to receive an email update when posts in this series are released, please enter your email address in the form below:

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , , , ,

17 Responses to Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6)

  1. Matt Gathu March 26, 2014 at 12:25 pm #

    Great post!! I did the scraping like a boss!! 🙂

    Looking forward to the next post.

    • Adrian Rosebrock March 26, 2014 at 1:58 pm #

      Glad you liked it!

  2. Divyansh May 24, 2016 at 9:00 am #

    Brilliant.

  3. Jeni February 12, 2017 at 7:41 am #

    Thanks a lot!! Very informative article.

    • Adrian Rosebrock February 13, 2017 at 1:43 pm #

      Thanks Jeni!

  4. Rohit May 9, 2017 at 5:38 am #

    Thanks Adrian

    Finally learnt a bit of BeautifulSoup after reading this blog.The website structure has changed I guess. Hence took me a while to figure out how to scrape the website

    Thanks again

    • Adrian Rosebrock May 11, 2017 at 8:55 am #

      BeautifulSoup is a great package, I definitely encourage readers to play with and use it. Great job re-scraping the website!

  5. Todor Arnaudov June 30, 2017 at 12:50 pm #

    Hi Adrian,

    I skipped the scraping and used the downloaded sprites in the archive. However while updating the code to Py3, I encountered a strange error during the indexing: at one point a sprite has failed to load.

    It happened to be due to special Unicode characters in the end of a couple of sprites, Venus and Mars:

    nidoran♀.png
    nidoran♂.png

    After renaming them to ordinary symbols, it run fine: 🙂

    nidoran1.png
    nidoran2.png

    • Adrian Rosebrock July 5, 2017 at 6:37 am #

      Thanks for sharing, Todor!

  6. sudip July 12, 2017 at 7:56 am #

    how is parsing donw if i am pycharm user

    • Adrian Rosebrock July 12, 2017 at 2:41 pm #

      You should read up on how to set command line arguments with PyCharm. You could also:

      1. Hardcode the values and ignore the command line argument code.
      2. Code in PyCharm and execute via command line (which I think is the best way).

  7. Srividhya Prakash February 9, 2018 at 3:20 am #

    I think its worth mentioning that if someone is using BeautifulSoup4,

    the import code should be

    from bs4 import BeautifulSoup

  8. han March 11, 2019 at 12:47 pm #

    thanks for post. but how do you know link.text have the name of pokemon

Trackbacks/Pingbacks

  1. Building a Pokedex in Python: Indexing our Sprites using Shape Descriptors (Step 3 of 6) - PyImageSearch - April 7, 2014

    […] Step 2: Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6) […]

  2. Building a Pokedex in Python: Finding the Game Boy Screen (Step 4 of 6) - PyImageSearch - April 22, 2014

    […] Step 2: Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6) […]

  3. Python and OpenCV Example: Warp Perspective and Transform - May 5, 2014

    […] Step 2: Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6) […]

  4. Comparing Shape Descriptors for Similarity using Python and OpenCV - May 19, 2014

    […] explored what it takes to build a Pokedex using computer vision. Then we scraped the web and built up a database of Pokemon. We’ve indexed our database of Pokemon sprites using Zernike moments. We’ve analyzed […]

Leave a Reply

[email]
[email]