Keras and Convolutional Neural Networks (CNNs)

Creating a Convolutional Neural Network using Keras to recognize a Bulbasaur stuffed Pokemon [image source]

Today’s blog post is part two in a three-part series on building a complete end-to-end image classification + deep learning application:

By the end of today’s blog post, you will understand how to implement, train, and evaluate a Convolutional Neural Network on your own custom dataset.

And in next week’s post, I’ll be demonstrating how you can take your trained Keras model and deploy it to a smartphone app with just a few lines of code!

To keep the series lighthearted and fun, I am fulfilling a childhood dream of mine and building a Pokedex. A Pokedex is a device that exists in the world of Pokemon, a popular TV show, video game, and trading card series (I was/still am a huge Pokemon fan).

If you are unfamiliar with Pokemon, you should think of a Pokedex as a smartphone app that can recognize Pokemon, the animal-like creatures that exist in the world of Pokemon.

You can swap in your own datasets of course, I’m just having fun and enjoying a bit of childhood nostalgia.

To learn how to train a Convolutional Neural Network with Keras and deep learning on your own custom dataset, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Keras and Convolutional Neural Networks

In last week’s blog post we learned how we can quickly build a deep learning image dataset — we used the procedure and code covered in the post to gather, download, and organize our images on disk.

Now that we have our images downloaded and organized, the next step is to train a Convolutional Neural Network (CNN) on top of the data.

I’ll be showing you how to train your CNN in today’s post using Keras and deep learning. The final part of this series, releasing next week, will demonstrate how you can take your trained Keras model and deploy it to a smartphone (in particular, iPhone) with only a few lines of code.

The end goal of this series is to help you build a fully functional deep learning app — use this series as an inspiration and starting point to help you build your own deep learning applications.

Let’s go ahead and get started training a CNN with Keras and deep learning.

Our deep learning dataset

Figure 1: A montage of samples from our Pokemon deep learning dataset depicting each of the classes (i.e., Pokemon species). As we can see, the dataset is diverse, including illustrations, movie/TV show stills, action figures, toys, etc.

Our deep learning dataset consists of 1,191 images of Pokemon, (animal-like creatures that exist in the world of Pokemon, the popular TV show, video game, and trading card series).

Our goal is to train a Convolutional Neural Network using Keras and deep learning to recognize and classify each of these Pokemon.

The Pokemon we will be recognizing include:

A montage of the training images for each class can be seen in Figure 1 above.

As you can see, our training images include a mix of:

  • Still frames from the TV show and movies
  • Trading cards
  • Action figures
  • Toys and plushes
  • Drawings and artistic renderings from fans

This diverse mix of training images will allow our CNN to recognize our five Pokemon classes across a range of images — and as we’ll see, we’ll be able to obtain 97%+ classification accuracy!

The Convolutional Neural Network and Keras project structure

Today’s project has several moving parts — to help us wrap our head around the project, let’s start by reviewing our directory structure for the project:

There are 3 directories:

  1. dataset : Contains the five classes, each class is its own respective subdirectory to make parsing class labels easy.
  2. examples : Contains images we’ll be using to test our CNN.
  3. The pyimagesearch  module: Contains our SmallerVGGNet  model class (which we’ll be implementing later in this post).

And 5 files in the root:

  1. plot.png : Our training/testing accuracy and loss plot which is generated after the training script is ran.
  2. lb.pickle : Our LabelBinarizer  serialized object file — this contains a class index to class name lookup mechamisn.
  3. pokedex.model : This is our serialized Keras Convolutional Neural Network model file (i.e., the “weights file”).
  4. : We will use this script to train our Keras CNN, plot the accuracy/loss, and then serialize the CNN and label binarizer to disk.
  5. : Our testing script.

Our Keras and CNN architecture

Figure 2: A VGGNet-like network that I’ve dubbed “SmallerVGGNet” will be used for training a deep learning classifier with Keras. You can find the full resolution version of this network architecture diagram here.

The CNN architecture we will be utilizing today is a smaller, more compact variant of the VGGNet network, introduced by Simonyan and Zisserman in their 2014 paper, Very Deep Convolutional Networks for Large Scale Image Recognition.

VGGNet-like architectures are characterized by:

  1. Using only 3×3 convolutional layers stacked on top of each other in increasing depth
  2. Reducing volume size by max pooling
  3. Fully-connected layers at the end of the network prior to a softmax classifier

I assume you already have Keras installed and configured on your system. If not, here are a few links to deep learning development environment configuration tutorials I have put together:

If you want to skip configuring your deep learning environment, I would recommend using one of the following pre-configured instances in the cloud:

Let’s go ahead and implement SmallerVGGNet , our smaller version of VGGNet. Create a new file named  inside the pyimagesearch  module and insert the following code:

First we import our modules — notice that they all come from Keras. Each of these are covered extensively throughout the course of reading Deep Learning for Computer Vision with Python.

Note: You’ll also want to create an  file inside pyimagesearch  so Python knows the directory is a module. If you’re unfamiliar with  files or how they are used to create modules, no worries, just use the “Downloads” section at the end of this blog post to download my directory structure, source code, and dataset + example images.

From there, we define our SmallerVGGNet  class:

Our build method requires four parameters:

  • width : The image width dimension.
  • height : The image height dimension.
  • depth : The depth of the image — also known as the number of channels.
  • classes : The number of classes in our dataset (which will affect the last layer of our model). We’re utilizing 5 Pokemon classes in this post, but don’t forget that you could work with the 807 Pokemon species if you downloaded enough example images for each species!

Note: We’ll be working with input images that are  96 x 96 with a depth of 3  (as we’ll see later in this post). Keep this in mind as we explain the spatial dimensions of the input volume as it passes through the network.

Since we’re using the TensorFlow backend, we arrange the input shape with “channels last” data ordering, but if you want to use “channels first” (Theano, etc.) then it is handled automagically on Lines 23-25.

Now, let’s start adding layers to our model:

Above is our first  CONV => RELU => POOL  block.

The convolution layer has 32  filters with a 3 x 3  kernel. We’re using RELU  the activation function followed by batch normalization.

Our POOL  layer uses a 3 x 3  POOL  size to reduce spatial dimensions quickly from 96 x 96  to 32 x 32 (we’ll be using   96 x 96 x 3 input images to train our network as we’ll see in the next section).

As you can see from the code block, we’ll also be utilizing dropout in our network architecture. Dropout works by randomly disconnecting nodes from the current layer to the next layer. This process of random disconnects during training batches helps naturally introduce redundancy into the model — no one single node in the layer is responsible for predicting a certain class, object, edge, or corner.

From there we’ll add  (CONV => RELU) * 2  layers before applying another POOL  layer:

Stacking multiple CONV  and RELU  layers together (prior to reducing the spatial dimensions of the volume) allows us to learn a richer set of features.

Notice how:

  • We’re increasing our filter size from 32  to 64 . The deeper we go in the network, the smaller the spatial dimensions of our volume, and the more filters we learn.
  • We decreased how max pooling size from 3 x 3  to 2 x 2  to ensure we do not reduce our spatial dimensions too quickly.

Dropout is again performed at this stage.

Let’s add another set of   (CONV => RELU) * 2 => POOL :

Notice that we’ve increased our filter size to 128  here. Dropout of 25% of the nodes is performed to reduce overfitting again.

And finally, we have a set of FC => RELU  layers and a softmax classifier:

The fully connected layer is specified by Dense(1024) with a rectified linear unit activation and batch normalization.

Dropout is performed a final time — this time notice that we’re dropping out 50% of the nodes during training. Typically you’ll use a dropout of 40-50% in our fully-connected layers and a dropout with much lower rate, normally 10-25% in previous layers (if any dropout is applied at all).

We round out the model with a softmax classifier that will return the predicted probabilities for each class label.

A visualization of the network architecture of first few layers of  SmallerVGGNet  can be seen in Figure 2 at the top of this section. To see the full resolution of our Keras CNN implementation of SmallerVGGNet , refer to the following link.

Implementing our CNN + Keras training script

Now that SmallerVGGNet  is implemented, we can train our Convolutional Neural Network using Keras.

Open up a new file, name it , and insert the following code where we’ll import our required packages and libraries:

We are going to use the "Agg"  matplotlib backend so that figures can be saved in the background (Line 3).

The ImageDataGenerator  class will be used for data augmentation, a technique used to take existing images in our dataset and apply random transformations (rotations, shearing, etc.) to generate additional training data. Data augmentation helps prevent overfitting.

Line 7 imports the Adam  optimizer, the optimizer method used to train our network.

The LabelBinarizer  (Line 9) is an important class to note — this class will enable us to:

  1. Input a set of class labels (i.e., strings representing the human-readable class labels in our dataset).
  2. Transform our class labels into one-hot encoded vectors.
  3. Allow us to take an integer class label prediction from our Keras CNN and transform it back into a human-readable label.

I often get asked hereon the PyImageSearch blog how we can transform a class label string to an integer and vice versa. Now you know the solution is to use the LabelBinarizer  class.

The train_test_split  function (Line 10) will be used to create our training and testing splits. Also take note of our SmallerVGGNet  import on Line 11 — this is the Keras CNN we just implemented in the previous section.

Readers of this blog are familiar with my very own imutils package. If you don’t have it installed/updated, you can install it via:

If you are using a Python virtual environment (as we typically do here on the PyImageSearch blog), make sure you use the workon  command to access your particular virtual environment before installing/upgrading imutils .

From there, let’s parse our command line arguments:

For our training script, we need to supply three required command line arguments:

  • --dataset : The path to the input dataset. Our dataset is organized in a dataset  directory with subdirectories representing each class. Inside each subdirectory is ~250 Pokemon images. See the project directory structure at the top of this post for more details.
  • --model : The path to the output model — this training script will train the model and output it to disk.
  • --labelbin : The path to the output label binarizer — as you’ll see shortly, we’ll extract the class labels from the dataset directory names and build the label binarizer.

We also have one optional argument, --plot . If you don’t specify a path/filename, then a plot.png  file will be placed in the current working directory.

You do not need to modify Lines 22-31 to supply new file paths. The command line arguments are handled at runtime. If this doesn’t make sense to you, be sure to review my command line arguments blog post.

Now that we’ve taken care of our command line arguments, let’s initialize some important variables:

Lines 35-38 initialize important variables used when training our Keras CNN:

  • EPOCHS:  The total number of epochs we will be training our network for (i.e., how many times our network “sees” each training example and learns patterns from it).
  • INIT_LR:  The initial learning rate — a value of 1e-3 is the default value for the Adam optimizer, the optimizer we will be using to train the network.
  • BS:  We will be passing batches of images into our network for training. There are multiple batches per epoch. The BS  value controls the batch size.
  • IMAGE_DIMS:  Here we supply the spatial dimensions of our input images. We’ll require our input images to be 96 x 96  pixels with 3  channels (i.e., RGB). I’ll also note that we specifically designed SmallerVGGNet with 96 x 96  images in mind.

We also initialize two lists — data  and labels which will hold the preprocessed images and labels, respectively.

Lines 46-48 grab all of the image paths and randomly shuffle them.

And from there, we’ll loop over each of those imagePaths :

We loop over the imagePaths  on Line 51 and then proceed to load the image (Line 53) and resize it to accommodate our model (Line 54).

Now it’s time to update our data  and labels  lists.

We call the Keras img_to_array  function to convert the image to a Keras-compatible array (Line 55) followed by appending the image to our list called data (Line 56).

For our labels  list, we extract the label  from the file path on Line 60 and append it (the label) on Line 61.

So, why does this class label parsing process work?

Consider that fact that we purposely created our dataset directory structure to have the following format:

Using the path separator on Line 60 we can split the path into an array and then grab the second-to-last entry in the list — the class label.

If this process seems confusing to you, I would encourage you to open up a Python shell and explore an example imagePath  by splitting the path on your operating system’s respective path separator.

Let’s keep moving. A few things are happening in this next code block — additional preprocessing, binarizing labels, and partitioning the data:

Here we first convert the data  array to a NumPy array and then scale the pixel intensities to the range  [0, 1]  (Line 64). We also convert the labels  from a list to a NumPy array on Line 65. An info message is printed which shows the size (in MB) of the data  matrix.

Then, we binarize the labels utilizing scikit-learn’s LabelBinarizer  (Lines 70 and 71).

With deep learning, or any machine learning for that matter, a common practice is to make a training and testing split. This is handled on Lines 75 and 76 where we create an 80/20 random split of the data.

Next, let’s create our image data augmentation object:

Since we’re working with a limited amount of data points (< 250 images per class), we can make use of data augmentation during the training process to give our model more images (based on existing images) to train with.

Data Augmentation is a tool that should be in every deep learning practitioner’s toolbox. I cover data augmentation in the Practitioner Bundle of Deep Learning for Computer Vision with Python.

We initialize aug, our ImageDataGenerator , on Lines 79-81.

From there, let’s compile the model and kick off the training:

On Lines 85 and 86, we initialize our Keras CNN model with 96 x 96 x 3  input spatial dimensions. I’ll state this again as I receive this question often — SmallerVGGNet was designed to accept 96 x 96 x 3  input images. If you want to use different spatial dimensions you may need to either:

  1. Reduce the depth of the network for smaller images
  2. Increase the depth of the network for larger images

Do not go blindly editing the code. Consider the implications larger or smaller images will have first!

We’re going to use the Adam  optimizer with learning rate decay (Line 87) and then compile  our model  with categorical cross-entropy since we have > 2 classes (Lines 88 and 89).

Note: For only two classes you should use binary cross-entropy as the loss.

From there, we make a call to the Keras fit_generator  method to train the network (Lines 93-97). Be patient — this can take some time depending on whether you are training using a CPU or a GPU.

Once our Keras CNN has finished training, we’ll want to save both the (1) model and (2) label binarizer as we’ll need to load them from disk when we test the network on images outside of our training/testing set:

We serialize the model (Line 101) and the label binarizer (Lines 105-107) so we can easily use them later in our  script.

The label binarizer file contains the class index to human-readable class label dictionary. This object ensures we don’t have to hardcode our class labels in scripts that wish to use our Keras CNN.

Finally, we can plot our training and loss accuracy:

I elected to save my plot to disk (Line 121) rather than displaying it for two reasons: (1) I’m on a headless server in the cloud and (2) I wanted to make sure I don’t forget to save the plot.

Training our CNN with Keras

Now we’re ready to train our Pokedex CNN.

Be sure to visit the “Downloads” section of this blog post to download code + data.

Then execute the following command to train the mode; while making sure to provide the command line arguments properly:

Looking at the output of our training script we see that our Keras CNN obtained:

  • 96.84% classification accuracy on the training set
  • And 97.07% accuracy on the testing set

The training loss/accuracy plot follows:

Figure 3: Training and validation loss/accuracy plot for a Pokedex deep learning classifier trained with Keras.

As you can see in Figure 3, I trained the model for 100 epochs and achieved low loss with limited overfitting. With additional training data we could obtain higher accuracy as well.

Creating our CNN and Keras testing script

Now that our CNN is trained, we need to implement a script to classify images that are not part of our training or validation/testing set. Open up a new file, name it , and insert the following code:

First we import the necessary packages (Lines 2-9).

From there, let’s parse command line arguments:

We’ve have three required command line arguments we need to parse:

  • --model : The path to the model that we just trained.
  • --labelbin : The path to the label binarizer file.
  • --image : Our input image file path.

Each of these arguments is established and parsed on Lines 12-19. Remember, you don’t need to modify these lines — I’ll show you how to run the program in the next section using the command line arguments provided at runtime.

Next, we’ll load and preprocess the image:

Here we load the input  image  (Line 22) and make a copy called output  for display purposes (Line 23).

Then we preprocess the image  in the exact same manner that we did for training (Lines 26-29).

From there, let’s load the model + label binarizer and then classify the image:

In order to classify the image, we need the model  and label binarizer in memory. We load both on Lines 34 and 35.

Subsequently, we classify the image  and create the label  (Lines 39-41).

The remaining code block is for display purposes:

On Lines 46 and 47, we’re extracting the name of the Pokemon from the filename  and comparing it to the label . The correct  variable will be either "correct"  or "incorrect"  based on this. Obviously these two lines make the assumption that your input image has a filename that contains the true label.

From there we take the following steps:

  1. Append the probability percentage and "correct" / "incorrect"  text to the class  label  (Line 50).
  2. Resize the output  image so it fits our screen (Line 51).
  3. Draw the label  text on the output  image (Lines 52 and 53).
  4. Display the output  image and wait for a keypress to exit (Lines 57 and 58).

Classifying images with our CNN and Keras

We’re now ready to run the  script!

Ensure that you’ve grabbed the code + images from the “Downloads” section at the bottom of this post.

Once you’ve downloaded and unzipped the archive change into the root directory of this project and follow along starting with an image of Charmander. Notice that we’ve provided three command line arguments in order to run the script:

Figure 4: Correctly classifying an input image using Keras and Convolutional Neural Networks.

And now let’s query our model with the loyal and fierce Bulbasaur stuffed Pokemon:

Figure 5: Again, our Keras deep learning image classifier is able to correctly classify the input image [image source]

Let’s try a toy action figure of Mewtwo (a genetically engineered Pokemon):

Figure 6: Using Keras, deep learning, and Python we are able to correctly classify the input image using our CNN. [image source]

What would an example Pokedex be if it couldn’t recognize the infamous Pikachu:

Figure 7: Using our Keras model we can recognize the iconic Pikachu Pokemon. [image source]

Let’s try the cute Squirtle Pokemon:

Figure 8: Correctly classifying image data using Keras and a CNN. [image source]

And last but not least, let’s classify my fire-tailed Charmander again. This time he is being shy and is partially occluded by my monitor.

Figure 9: One final example of correctly classifying an input image using Keras and Convolutional Neural Networks (CNNs).

Each of these Pokemons were no match for my new Pokedex.

Currently, there are around 807 different species of Pokemon. Our classifier was trained on only five different Pokemon (for the sake of simplicity).

If you’re looking to train a classifier to recognize more Pokemon for a bigger Pokedex, you’ll need additional training images for each classIdeally, your goal should be to have 500-1,000 images per class you wish to recognize.

To acquire training images, I suggest that you look no further than Microsoft Bing’s Image Search API. This API is hands down easier to use than the previous hack of Google Image Search that I shared (but that would work too).

Limitations of this model

One of the primary limitations of this model is the small amount of training data. I tested on various images and at times the classifications were incorrect. When this happened, I examined the input image + network more closely and found that the color(s) most dominant in the image influence the classification dramatically.

For example, lots of red and oranges in an image will likely return “Charmander” as the label. Similarly, lots of yellows in an image will normally result in a “Pikachu” label.

This is partially due to our input data. Pokemon are obviously fictitious so there no actual “real-world” images of them (other than the action figures and toy plushes).

Most of our images came from either fan illustrations or stills from the movie/TV show. And furthermore, we only had a limited amount of data for each class (~225-250 images).

Ideally, we should have at least 500-1,000 images per class when training a Convolutional Neural Network. Keep this in mind when working with your own data.

Can we use this Keras deep learning model as a REST API?

If you would like to run this model (or any other deep learning model) as a REST API, I wrote three blog posts to help you get started:

  1. Building a simple Keras + deep learning REST API ( guest post)
  2. A scalable Keras + deep learning REST API
  3. Deep learning in production with Keras, Redis, Flask, and Apache


In today’s blog post you learned how to train a Convolutional Neural Network (CNN) using the Keras deep learning library.

Our dataset was gathered using the procedure discussed in last week’s blog post.

In particular, our dataset consists of 1,191 images of five separate Pokemon (animal-like creatures that exist in the world of Pokemon, the popular TV show, video game, and trading card series).

Using our Convolutional Neural Network and Keras, we were able to obtain 97.07% accuracy, which is quite respectable given (1) the limited size of our dataset and (2) the number of parameters in our network.

In next week’s blog post I’ll be demonstrating how we can:

  1. Take our trained Keras + Convolutional Neural Network model…
  2. …and deploy it to a smartphone with only a few lines of code!

It’s going to be a great post, don’t miss it!

To download the source code to this post (and be notified when next week’s can’t miss post goes live), just enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

60 Responses to Keras and Convolutional Neural Networks (CNNs)

  1. Anirban April 16, 2018 at 11:38 am #

    Brilliant Post as usual.Thanks for sharing your knowledge.

    • Adrian Rosebrock April 16, 2018 at 2:00 pm #

      Thanks Anirban!

  2. Baterdene April 16, 2018 at 11:53 am #


  3. Mohamed Emad April 16, 2018 at 12:56 pm #

    Hello Adrian You are as distinct as usual
    I have touched something very important that stops too many people
    He wonders how to train a nervous network of my own
    And how to use cnn resnet models
    Thank you very much for your efforts in pushing people seriously forward
    I had a question about something to stop me and excuse me for this
    I was asking how I was implementing a gradual training for my model
    For example, I had a picture base for about 100 objects
    Each object has 10,000 pictures
    A model was built for this data
    When I collect more pictures I want to add them to my model
    Here I have to add pictures to the photo collection and then training again on all old and new photos?
    As everyone knows, this needs too much time.
    I learned about the incremental training but I do not know how to use it in practice
    Using any method (caffe or keras or etc)
    I hope you will give me a place to help me with the solution
    Thank you Adrian

    • Adrian Rosebrock April 16, 2018 at 1:59 pm #

      Hi Mohamed — you could technically train from scratch but this would likely be a waste of resources each and every time you add new images. I would suggest a hybrid approach where you:

      1. Apply fine-tuning to the network, perhaps on a weekly or monthly basis
      2. Only re-train from scratch once every 3-6 months

      The timeframes should be changed based on how often new images are added of course so you would need to change them to whatever is appropriate for your project. I also cover how to fine-tune a network inside Deep Learning for Computer Vision with Python.

      • Mohamed Emad April 16, 2018 at 3:00 pm #

        Thank you very much Adrian for your response
        I really benefited a lot from you
        Always forward
        Thank you

  4. Akbar Hidayatuloh April 16, 2018 at 10:01 pm #

    if i want to split my dataset into train, test and validation, what is the good method to do that? not only splitting dataset into train and test only.

    Thank you very much

    • Adrian Rosebrock April 17, 2018 at 9:27 am #

      You would use scikit-learn’s train_test_split function twice. The first time you split the data into two splits: training and testing.

      You then split a second time on the training data, creating another two splits: training and validation.

      This process will leave you with three splits: training, testing, and validation.

      • AKBAR HIDAYATULOH April 19, 2018 at 9:03 am #

        Thank you, that is really helpful.

        now i want to try top-5 accuracy, do you know how to do that?

        • Adrian Rosebrock April 20, 2018 at 10:08 am #

          I discuss rank-5 accuracy, including how to compute it, inside Deep Learning for Computer Vision with Python.

          The gist is that you need to:

          1. Loop over each of your test data points
          2. Predict the class labels for it
          3. Sort labels by their probability in descending order
          4. Check to see if ground-truth label exists in the top 5 predicted labels

          Refer to Deep Learning for Computer Vision with Python for more details, including implementation.

  5. Gilad April 17, 2018 at 3:03 am #

    I tried to do the same on 5 actresses. I got 44% accuracy on the validation and above 80% on the main group.
    I have ~280 pictures for each actress.
    How to increase the accuracy?
    1. increase the number of pictures
    2. try to find the face and work on it as ROI
    Do you have other ideas? maybe play with the training parameters (alpha)?

    • Adrian Rosebrock April 17, 2018 at 9:22 am #

      When performing face recognition you need to:

      1. Detect the face and extract the face ROI
      2. Classify the face

      Training a network to recognize faces on an entire image is not going to work well at all.

  6. Sagar Patil April 17, 2018 at 6:48 am #

    This dataset looks smaller than MNIST! I thing you should rather teach us how to work with real world data, where there a lot of classes, and the data is much more imbalanced.

    • Adrian Rosebrock April 17, 2018 at 9:19 am #

      I discuss how to gather your own training data in a previous post. The post you are commenting on is meant to be an introduction to Keras and CNNs. If you want an advanced treatment of the material with real-world data I would kindly refer you to my book, Deep Learning for Computer Vision with Python, where I have over 900+ pages worth of content on training deep neural networks on real-world data.

  7. Jesper April 17, 2018 at 7:00 am #

    As always a really great post!

    I was wondering if it’s possible to classify several objects in a picture (an image with several pokemons in it?) kinda like in one of your other great posts, using the models I train using Keras?

    Thank you so much for an awesome post

    • Adrian Rosebrock April 17, 2018 at 9:17 am #

      Hey Jesper — I’ll be writing a blog post on how and when you can use a CNN trained for image classification for object detection. The answer is too long to include in a comment as there is a lot to explain including when/where it’s possible. The post will be publishing on/around May 14th so keep an eye out for it.

      • Jesper April 18, 2018 at 4:27 am #

        You are the superman of so many things – thanks also for the distinction between image classification and object detection. These blogs are so good!

        Thanks again

        • Adrian Rosebrock April 18, 2018 at 2:52 pm #

          Thank you Jesper, I really appreciate that 🙂

  8. Sean April 17, 2018 at 4:31 pm #

    Hi Adrian, thank you for the great explanation in detail. During my computer vision course we were given 2 projects and I have used a lot of algorithms from your website. In the last project it is not required to use Deep-learning but I went for it anyways as a bonus, and i’m using your pokedex code.

    • Adrian Rosebrock April 18, 2018 at 3:03 pm #

      Nice! Best of luck with the project Sean. I hope it goes well.

  9. michael alex April 18, 2018 at 2:06 am #

    Good job as usual Adrian. I learned so much from this blog series!

    • Adrian Rosebrock April 18, 2018 at 3:00 pm #

      Thank you, Michael! Believe it or not, the series only gets better from here 🙂

  10. Idhant April 18, 2018 at 3:03 am #

    Hi, I loved this post and found it really useful as a beginner learning about CNN’s.

    Although I was getting a “memory error” at this step:

    data = np.array(data, dtype=”float”) / 255.0

    Actually, I added around 5k images to “data” and have around 13 classes… but clearly it is not working in this case… could you suggest anything to tackle this issue…

    • Adrian Rosebrock April 18, 2018 at 2:59 pm #

      Your system does not have enough memory to store all images in RAM. You can either:

      1. Update the code to use a data generator and augmentor that loads images from disk in small batches
      2. Build a serialized dataset, such as HDF5 format, and loop over the images in batches

      If you’re working with an image dataset too large to fit into main memory I would suggest reading through Deep Learning for Computer Vision with Python where I discuss my best practices and techniques to efficiently train your networks (code is included, of course).

  11. Alex April 18, 2018 at 11:50 am #

    hi adrian. how can I use this network to select the object in the image, such as the face.

    • Adrian Rosebrock April 18, 2018 at 2:44 pm #

      Hi Alex — what do you mean by “select”? Can you clarify? Perhaps you are referring to object detection or face detection?

      • Alex April 19, 2018 at 2:24 am #

        how do I use my trained model for object detection

      • Alex April 19, 2018 at 1:11 pm #

        Object detection

        • Adrian Rosebrock April 20, 2018 at 10:04 am #

          You cannot use this exact model for object detection. Deep learning object detectors fall into various frameworks such as Faster R-CNN, Single Shot Detectors (SSDs), YOLO, and others. I cover them in detail inside Deep Learning for Computer Vision with Python where I also demonstrate how to train your own custom deep learning object detectors. Be sure to take a look.

          I’ll also have a blog post coming out in early May that will help discuss the differences between object detection and image classification. This has become a common question on the PyImageSearch blog.

          Finally, if you are specifically interested in face detection, refer to this blog post.

  12. Bostjan April 18, 2018 at 12:14 pm #

    Hi Adrian,
    did you try to use CNN for iris recognition?
    Thanks for great post.

    • Adrian Rosebrock April 18, 2018 at 2:43 pm #

      Hi Bostjan — the iris of the eye? I have not used CNNs for iris recognition.

  13. Abdullah April 19, 2018 at 12:29 pm #

    Hi Adrian

    I got this error before starting training

    Using TensorFlow backend.
    [INFO] loading images…
    libpng warning: Incorrect bKGD chunk length
    [INFO] data matrix: 252.07MB
    [INFO] compiling model.

    can you clarify this for me?

    moreover, for the val_loss, after about 10 epochs it hit high loss number and get back to normal


    • Adrian Rosebrock April 20, 2018 at 10:05 am #

      This is not an error, it’s just a warning that the libpng library when it tried to load a specific image from disk. It can be safely ignored.

      • abdullah April 20, 2018 at 10:28 am #

        Thanks A lot Adrian for sharing the informative knowledge <<

  14. abdullah April 20, 2018 at 10:29 am #

    by the way, can i use this model for one classification only?

    • Adrian Rosebrock April 20, 2018 at 12:21 pm #

      I’m not sure what you mean by “one classification only” — could you clarify?

      • abdullah April 20, 2018 at 1:51 pm #

        for example, i want to detect only cats , so inside dataset folder i will have only cats folder

        • Adrian Rosebrock April 23, 2018 at 4:57 pm #

          To train a model you need at least two classes. If you want to detect only cats you should create a separate “background” or “ignore” class that consists of random (typically “natural scene”) images that do not contain cats. You can then train your model to predict “cat” or “background”.

  15. Gilad April 20, 2018 at 10:56 am #

    Hi Adrian,
    I would like to know how to set class weights for imbalanced classes in Keras.
    I remember I read it in DL4CV but I can’t find it.
    Can you point me to the chapter?

    • Adrian Rosebrock April 20, 2018 at 12:20 pm #

      Hi Gilad — the chapter you are referring to is the “Smile Detection” chapter of the Starter Bundle.

  16. Tyler April 20, 2018 at 6:07 pm #

    Very neat article, though I think there is still something to be said about Pokemon (and children’s media in general) being pre-engineered to be easily identifiable.

    Musing about a real-life equivalent, many esteemed researchers argue over which animals belong is which categories.

    I would be interesting to see a neural net which classifies animals among say, the order of ungulates.

    Really cool and great work! About to start on some hobby work involving Keras and OpenCV installed in Blender environment.

    Wish me luck!

  17. Mustafa April 21, 2018 at 1:16 am #

    Hi Adrian,

    Thanks for your great post. I want to detect more than one object and draw rectangle around them. How can i modify code?

    • Adrian Rosebrock April 23, 2018 at 12:00 pm #

      Classification models cannot be directly used for object detection. You would need a deep learning object detection framework such as Faster R-CNN, SSD, or YOLO. I cover them inside Deep Learning for Computer Vision with Python.

  18. Akshay Mathur April 21, 2018 at 1:55 pm #

    Amazing post. Really helpful for my project. Eagerly awaiting your next post.

  19. SHASHANK April 22, 2018 at 7:11 am #

    Hey can you also make a tutorial for object detection using keras..

  20. Navendu Sinha April 22, 2018 at 1:30 pm #

    Adrian a great post, something I have been looking forward to. How would you save the Keras Model in a h5 format.?

    • Adrian Rosebrock April 23, 2018 at 11:58 am #

      If you call the save method of a model it will write it to disk in a serialized HDF5 format.

  21. AKBAR HIDAYATULOH April 24, 2018 at 4:41 am #

    # scale the raw pixel intensities to the range [0, 1]
    data = np.array(data, dtype=”float”) / 255.0
    labels = np.array(labels)

    when i’m doing scaling my own data set on size 224 x 224 i got memory error, but the error not occurred if i used size 128 x 128.
    How to solve that error? i need to use the data set with size 224 x 224

    thank you very much,

    • Adrian Rosebrock April 24, 2018 at 5:38 pm #

      Your system is running out of RAM. Your entire dataset cannot fit into RAM. You can either (1) install more RAM on your system or (2) use a combination of lazy loading data generators from disk or use a serialized dataset, such an HDF5 file. I demonstrate how to do both inside Deep Learning for Computer Vision with Python.

  22. Bog Flap April 24, 2018 at 7:07 am #

    Ran this on your deep-learning-for-computer-vision AMI on AWS using a c4.2xlarge (the c4.xlarge instance type gave ALLOC errors, out of memory?) instance type and got the following

    [INFO] serializing label binarizer…
    Exception ignored in: <bound method BaseSession.__del__ of >
    Traceback (most recent call last):
    File “/home/ubuntu/.virtualenvs/dl4cv/lib/python3.5/site-packages/tensorflow/python/client/”, line 701, in __del__
    TypeError: ‘NoneType’ object is not callable

    • Adrian Rosebrock April 24, 2018 at 5:36 pm #

      This is a problem with the TensorFlow engine shutting down properly. It will only happen sporadically and since it only happens during termination of the script it can be safely ignored.

  23. Shubham Kumar April 24, 2018 at 10:51 am #

    Hi Adrian,

    Thanks a lot for such a wonderful post. I am doing my project somewhat similar to this. But in my dataset, I have only two Labels.

    One is background and in another different person with the background. I want to detect the presence of these people i.e i want to classify images into presence or absence (based on the presence of a person). But images in my dataset are of size 1092 X 1048 pixels. I have resized them to 512 X 512 using cv2.resize() function.

    My question is can I use this same model for the training. If not, how can I decide the model suitable for this case? I believe I have to use a deeper network because the size of images used is much large.


    • Adrian Rosebrock April 24, 2018 at 5:40 pm #

      Instead of training your model from scratch is there a reason you wouldn’t use existing deep learning networks that are trained to perform person detection? Secondly, if you apply face detection using Haar cascades or HOG + Linear SVM you may be able to skip using deep learning entirely.

      Depending on your input images, in particular how large, in pixels, the person is in the image, you may need to play around with larger input image dimensions — it’s hard to say which one will work best without seeing your data.

  24. scott April 24, 2018 at 2:46 pm #

    Great post! I went through this exercise with 250 images of water bottles, 250 of tennis balls, and 60 of dog poop. Yes dog poop. There’s a story in there for later. Anyway, it classifies anything that looks like any of the three classes as dog poop and one image of a tree as a tennis ball with 50% confidence. Most of the images are fairly well cropped. The failures on water bottles and tennis balls really surprise me. Is it likely that I just don’t have enough samples of the dog poop class?

    • Adrian Rosebrock April 24, 2018 at 5:35 pm #

      You may not have enough examples of the dog poop class but you may also want to compute the class weights to handle the imbalance.

  25. Bog Flap April 25, 2018 at 8:09 am #

    Ran this code on AWS running a c4.2xlarge instance. No problems. Messed up first time using the wrong AMI image, Version 1.2 is required. I am running this again now using bee images obtained using the bing image search as outlined by you Adrian, about 11000+ images with 35 classes. I suspect I may need to run this on a GPU instance, only time will tell.

    • Adrian Rosebrock April 25, 2018 at 10:17 am #

      Congrats on getting up and running with your dataset and network! For 11,000 images I would likely suggest a GPU instance, but that really depends on which model architecture you are using.

  26. Bog Flap April 25, 2018 at 8:10 am #

    That is bee’s as in honey bees

Leave a Reply