ImageNet classification with Python and Keras


Normally, I only publish blog posts on Monday, but I’m so excited about this one that it couldn’t wait and I decided to hit the publish button early.

You see, just a few days ago, François Chollet pushed three Keras models (VGG16, VGG19, and ResNet50) online — these networks are pre-trained on the ImageNet dataset, meaning that they can recognize 1,000 common object classes out-of-the-box.

To utilize these models in your own applications, all you need to do is:

  1. Install Keras.
  2. Clone the deep-learning-models repository.
  3. Download the weights files for the pre-trained network(s) (which we’ll be done automatically for you when you import and instantiate the respective network architecture).
  4. Apply the pre-trained ImageNet networks to your own images.

It’s really that simple.

So, why is this so exciting? I mean, we’ve had the weights to popular pre-trained ImageNet classification networks for awhile, right?

The problem is that these weight files are in Caffe format — and while the Caffe library may be the current standard for which many researchers use to construct new network architectures, train them, and evaluate them, Caffe also isn’t the most Python-friendly library in the world, at least in terms of constructing the network architecture itself.

Note: You can do some pretty cool stuff with the Caffe-Python bindings, but I’m mainly focusing on how Caffe architectures and the training process itself is defined via .prototxt  configuration files rather than code that logic can be inserted into.

There is also the fact that there isn’t an easy or streamlined method to convert Caffe weights to a Keras-compatible model.

That’s all starting to change now — we can now easily apply VGG16, VGG19, and ResNet50 using Keras and Python to our own applications without having to worry about the Caffe => Keras weight conversion process.

In fact, it’s now as simple as these three lines of code to classify an image using a Convolutional Neural Network pre-trained on the ImageNet dataset with Python and Keras:

Of course, there are a few other imports and helper functions that need to be utilized — but I think you get the point:

It’s now dead simple to apply ImageNet-level pre-trained networks using Python and Keras.

To find out how, keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

ImageNet classification with Python and Keras

In the remainder of this tutorial, I’ll explain what the ImageNet dataset is, and then provide Python and Keras code to classify images into 1,000 different categories using state-of-the-art network architectures.

What is ImageNet?

Within computer vision and deep learning communities, you might run into a bit of contextual confusion surrounding what ImageNet is and what it isn’t.

You see, ImageNet is actually a project aimed at labeling and categorizing images into almost 22,000 categories based on a defined set of words and phrases. At the time of this writing, there are over 14 million images in the ImageNet project.

So, how is ImageNet organized?

To order such a massive amount of data, ImageNet actually follows the WordNet hierarchy. Each meaningful word/phrase inside WordNet is called a “synonym set” or “synset” for short. Within the ImageNet project, images are organized according to these synsets, with the goal being to have 1,000+ images per synset.

ImageNet Large Scale Recognition Challenge (ILSVRC)

In the context of computer vision and deep learning, whenever you hear people talking about ImageNet, they are very likely referring to the ImageNet Large Scale Recognition Challenge, or simply ILSVRC for short.

The goal of the image classification track in this challenge is to train a model that can classify an image into 1,000 separate categories using over 100,000 test images — the training dataset itself consists of approximately 1.2 million images.

Be sure to keep the context of ImageNet in mind when you’re reading the remainder of this blog post or other tutorials and papers related to ImageNet. While in the context of image classification, object detection, and scene understanding, we often refer to ImageNet as the classification challenge and the dataset associated with the challenge, remember that there is also a more broad project called ImageNet where these images are collected, annotated, and organized.

Configuring your system for Keras and ImageNet

To configure your system to use the state-of-the-art VGG16, VGG19, and ResNet50 networks, make sure you follow my previous tutorial on installing Keras.

The Keras library will use PIL/Pillow for some helper functions (such as loading an image from disk). You can install Pillow, the more Python friendly fork of PIL, by using this command:

To run the networks pre-trained on the ImageNet dataset with Python, you’ll need to make sure you have the latest version of Keras installed. At the time of this writing, the latest version of Keras is 1.0.6 , the minimum requirement for utilizing the pre-trained models.

You can check your version of Keras by executing the following commands:

Alternatively, you can use pip freeze  to list the out the packages installed in your environment:

Figure 1: Listing the set of Python packages installed in your environment.

Figure 1: Listing the set of Python packages installed in your environment.

If you are using an earlier version of Keras prior to 1.0.6 , uninstall it, and then use my previous tutorial to install the latest version.

Next, to gain access to VGG16, VGG19, and the ResNet50 architectures and pre-trained weights, you need to clone the deep-learning-models repository from GitHub:

From there, change into the deep-learning-models  directory and ls  the contents:

Notice how we have four Python files. The , , and  files correspond to their respective network architecture definitions.

The imagenet_utils  file, as the name suggests, contains a couple helper functions that allow us to prepare images for classification as well as obtain the final class label predictions from the network.

Keras and Python code for ImageNet CNNs

We are now ready to write some Python code to classify image contents utilizing Convolutional Neural Networks (CNNs) pre-trained on the ImageNet dataset.

To start, open up a new file, name it , and insert the following code:

We start on Lines 2-8 by importing our required Python packages. Line 2 imports the image  pre-processing module directly from the Keras library. However, Lines 3-5 import functions and network architectures from within the deep-learning-models  directory. Because of this, you’ll want to make sure your  file is inside the deep-learning-models  directory (or your PYTHONPATH  is updated accordingly), otherwise your script will fail to import these functions.

Alternatively, you can use the “Downloads” section at the bottom of this tutorial to download the source code + example images. This download ensures the code is configured correctly and that your directory structure is setup properly.

Lines 11-14 parse our command line arguments. We only need a single switch here, --image , which is the path to our input image.

We then load our image in OpenCV format on Line 18. This step isn’t strictly required since Keras provides helper functions to load images (which I’ll demonstrate in the next code block), but there are differences in how both these functions work, so if you intend on applying any type of OpenCV functions to your images, I suggest loading your image via cv2.imread  and then again via the Keras helpers. Once you get a bit more experience manipulating NumPy arrays and swapping channels, you can avoid the extra I/O overhead, but for the time being, let’s keep things simple.

Line 25 applies the .load_img  Keras helper function to load our image from disk. We supply a target_size  of 224 x 224 pixels, the required spatial input image dimensions for the VGG16, VGG19, and ResNet50 network architectures.

After calling .load_img , our image  is actually in PIL/Pillow format, so we need to apply the .img_to_array  function to convert the image  to a NumPy format.

Next, let’s preprocess our image:

If at this stage we inspect the .shape  of our image , you’ll notice the shape of the NumPy array is (3, 224, 224) — each image is 224 pixels wide, 224 pixels tall, and has 3 channels (one for each of the Red, Green, and Blue channels, respectively).

However, before we can pass our image  through our CNN for classification, we need to expand the dimensions to be (1, 3, 224, 224).

Why do we do this?

When classifying images using Deep Learning and Convolutional Neural Networks, we often send images through the network in “batches” for efficiency. Thus, it’s actually quite rare to pass only one image at a time through the network — unless of course, you only have one image to classify (like we do).

We then preprocess the image  on Line 33 by subtracting the mean RGB pixel intensity computed from the ImageNet dataset.

Finally, we can load our Keras network and classify the image:

On Line 37 we initialize our VGG16  class. We could also substitute in VGG19  or ResNet50  here, but for the sake of this tutorial, we’ll use VGG16 .

Supplying weights="imagenet"  indicates that we want to use the pre-trained ImageNet weights for the respective model.

Once the network has been loaded and initialized, we can predict class labels by making a call to the .predict  method of the model . These predictions are actually a NumPy array with 1,000 entries — the predicted probabilities associated with each class in the ImageNet dataset.

Calling decode_predictions  on these predictions gives us the ImageNet Unique ID of the label, along with a human-readable text version of the label.

Note: When using newer versions of Keras, you might get an error on Lines 44 and 45 when computing the output class labels of the image. In that case, change the lines to:

Finally, Lines 45-49 print the predicted label  to our terminal and display the output image to our screen.

ImageNet + Keras image classification results

To apply the Keras models pre-trained on the ImageNet dataset to your own images, make sure you use the “Downloads” form at the bottom of this blog post to download the source code and example images. This will ensure your code is properly formatted (without errors) and your directory structure is correct.

But before we can apply our pre-trained Keras models to our own images, let’s first discuss how the model weights are (automatically) downloaded.

Downloading the model weights

The first time you execute the  script, Keras will automatically download and cache the architecture weights to your disk in the ~/.keras/models  directory.

Subsequent runs of  will be substantially faster (since the network weights will already be downloaded) — but that first run will be quite slow (comparatively), due to the download process.

That said, keep in mind that these weights are fairly large HDF5 files and might take awhile to download if you do not have a fast internet connection. For convenience, I have listed out the size of the weights files for each respective network architecture:

  • ResNet50: 102MB
  • VGG16: 553MB
  • VGG19: 574MB

ImageNet and Keras results

We are now ready to classify images using the pre-trained Keras models! To test out the models, I downloaded a couple images from Wikipedia (“brown bear” and “space shuttle”) — the rest are from my personal library.

To start, execute the following command:

Notice that since this is my first run of , the weights associated with the VGG16 ImageNet model need to be downloaded:

Figure 2: Downloading the pre-trained ImageNet weights for VGG16.

Figure 2: Downloading the pre-trained ImageNet weights for VGG16.

Once our weights are downloaded, the VGG16 network is initialized, the ImageNet weights loaded, and the final classification is obtained:

Figure 3: Utilizing the VGG16 network trained on ImageNet to recognize a beagle in an image.

Figure 3: Utilizing the VGG16 network trained on ImageNet to recognize a beagle (dog) in an image.

Let’s give another image a try, this one of a beer glass:

Figure 4: Recognizing a beer glass using a Convolutional Neural Network trained on ImageNet.

Figure 4: Recognizing a beer glass using a Convolutional Neural Network trained on ImageNet.

The following image is of a brown bear:

IMAGE Figure 5: Utilizing VGG16, Keras, and Python to recognize the brown bear in an image.

Figure 5: Utilizing VGG16, Keras, and Python to recognize the brown bear in an image.

I took the following photo of my keyboard to test out the ImageNet network using Python and Keras:

Figure 6: Utilizing Python, Keras, and a Convolutional Neural Network trained on ImageNet to recognize image contents.

Figure 6: Utilizing Python, Keras, and a Convolutional Neural Network trained on ImageNet to recognize image contents.

I then took a photo of my monitor as I was writing the code for this blog post. Interestingly, the network classified this image as “desktop computer”, which makes sense given that the monitor is the primary subject of the image:

Figure 7: Image classification via Python, Keras, and CNNs.

Figure 7: Image classification via Python, Keras, and CNNs.

This next image is of a space shuttle:

Figure 8: Recognizing image contents using a Convolutional Neural Network trained on ImageNet via Keras + Python.

Figure 8: Recognizing image contents using a Convolutional Neural Network trained on ImageNet via Keras + Python.

The final image is of a steamed crab, a blue crab, to be specific:

Figure 9: Convolutional Neural Networks and ImageNet for image classification with Python and Keras.

Figure 9: Convolutional Neural Networks and ImageNet for image classification with Python and Keras.

What I find interesting about this particular example is that VGG16 classified this image as “Dungeness crab”, which may be technically incorrect. However, keep in mind that blue crabs are called blue crabs for a reason — their outer shell is blue. It is not until you steam them for eating do their shells turn red. The Dungeness crab on the other hand has a slightly dark orange tint to it, even before steaming. The fact that the network was even able to label this image as “crab” is very impressive.

A note on model timing

From start to finish (not including the downloading of the network weights files), classifying an image using VGG16 took approximately 11 seconds on my Titan X GPU. This includes the process of actually loading both the image and network from disk, performing any initializations, passing the image through the network, and obtaining the final predictions.

However, once the network is actually loaded into memory, classification takes only 1.8 seconds, which goes to show you how much overhead is involved in actually loading an initializing a large Convolutional Neural Network. Furthermore, since images can be presented to the network in batches, this same time for classification will hold for multiple images.

If you’re classifying images on your CPU, then you should obtain a similar classification time. This is mainly because there is substantial overhead in copying the image from memory over to the GPU. When you pass multiple images via batches, it makes the I/O overhead for using the GPU more acceptable.


In this blog post, I demonstrated how to use the newly released deep-learning-models repository to classify image contents using state-of-the-art Convolutional Neural Networks trained on the ImageNet dataset.

To accomplish this, we leveraged the Keras library, which is maintained by François Chollet — be sure to reach out to him and say thanks for maintaining such an incredible library. Without Keras, deep learning with Python wouldn’t be half as easy (or as fun).

Of course, you might be wondering how to train your own Convolutional Neural Network from scratch using ImageNet. Don’t worry, we’re getting there — we just need to understand the basics of neural networks, machine learning, and deep learning first. Walk before you run, so to speak.

I’ll be back next week with a tutorial on hyperparameter tuning, a key step to maximizing your model’s accuracy.

To be notified when future blog posts are published on the PyImageSearch blog, be sure to enter your email address in the form below — se you next week!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

75 Responses to ImageNet classification with Python and Keras

  1. RyanE August 10, 2016 at 4:18 pm #

    Is there a way to only learn part of the large imagenet dataset, if your classification needs are more along the lines of “Is there a chicken in this picture, and if so, where?”


    • Adrian Rosebrock August 11, 2016 at 7:00 am #

      Absolutely. You would normally take a pre-trained network, “freeze” the lower-level layers of the network so that their weights don’t change, and then apply fine-tuning using your partial (or custom dataset).

  2. Alexander August 10, 2016 at 7:04 pm #

    Hi! There is a bunch of articles about how to classify using pre-trained weights and ready-models on the internet. But I can’t understand why there is almost no articles about how to train that models on your own. Currently I’m trying to train Inception V1 using Keras and follow by this code Everything is clear why we just use pre-trained weights and some images to classify them, but real problems appear when I’m trying to train the model with my images set. Here’s some questions those I can’t solve (maybe I’m too stupid…):
    1. how to get images by category from the ImageNet (e.g. I need all plants)
    2. how to preprocess images and keep them ready for training: should we vectorise them? where to store labels for images?
    3. what shape to use for model input on train
    4. what shape to use for model output on train
    5. can we train model for two classes?

    It feels like it’s a big secret on how to train such models and those guys just brag about how they trained them

    • Adrian Rosebrock August 11, 2016 at 6:59 am #

      These are all great questions Alexander. Training CNNs doesn’t have to be a “black art”. And honestly, I learned the answers to these via trial and error. I can’t answer all of these questions in the comments, mainly because my comment reply would be longer than the blog post it self — but I have a roadmap planned to address these types of questions related to training your own custom CNNs. Keep following the PyIMageSearch blog!

      • Alexander August 13, 2016 at 2:38 am #

        Hi! Thanks for the reply. I hope the blog post will come to world soon)))

    • garret August 18, 2016 at 10:27 am #

      I understand your frustration. Take a look at
      He goes over the actual training of Deep Neural Networks for tensorflow in that tutorial. Also I was checking out the git repository of François Chollet and hes got a list of tutorials that sound promising.

  3. Marko Plahuta August 11, 2016 at 1:17 am #

    Thanks for your articles, they are well written and easy to understand. Could you write one about efficiently and quickly detecting multiple objects on an image? I already implemented a pyramid method once with Keras and a preloaded VGG16 model, but as you can imagine, it’s very slow for the exact reasons you pojnted out. Is there an architecture that would allow one to do just one pass through the net with entire image, and get bounding boxes of detected objects, along with labels? Thanks!

    • Adrian Rosebrock August 11, 2016 at 7:01 am #

      Sure, I’ll absolutely consider a blog post on object localization using Keras. If you’re super concerned about speed, I like using the You Only Look Once method since it’s (1) fast and (2) straightforward.

      • Tushar Soni November 18, 2016 at 3:19 pm #

        I’ve same problem, I want YOLO in Keras, when can we have that blog post.

        • Adrian Rosebrock November 21, 2016 at 12:40 pm #

          I’ll likely be covering YOLO inside my upcoming deep learning book.

  4. Geoffrey Anderson August 11, 2016 at 1:59 pm #

    Thanks for the write-up, Adrian.

    I’d actually more like to use ImageNet to beg, borrow, or steal (or train my own) an unsupervised set of weights trained as an autoencoder to behave as the lower layer to help me detect edges better in a medical imaging application, where just 250 images (examples) exist all-told. Or perhaps I’d, alternatively, directly use the lower layer of this supervised-learning model. Maybe either one will work as the lower layer for generic edge detection? Not sure. But anyway if I can get a generic vision layer at the bottom from someone else’s ImageNet training efforts, then the higher layers would be trained by me on my computer hardware, to be completely application-specific.

    I’ve actually seen some academic papers where ImageNet proved actually helpful on medical images, even though there were no frogs and cats in the medical images, strange as this may seem.

    • Adrian Rosebrock August 12, 2016 at 10:53 am #

      Hey Geoffrey — the lower level layers of ImageNet consist of filters that can be used to detect blobs and edge-like regions. You could essentially “freeze” these layer weights and then apply fine-tuning to the higher-level layers of the network to recognize particular structures in your images.

      However, depending on what you’re trying to do, I would take one of the higher level layers of the network and treat it as my feature extraction layer. Take the output of one of these layers (normally a CONV or POOL layer) and treat the output as a feature vector. This feature vector can then be used for classification, clustering, etc.

    • Simon Burfield October 21, 2016 at 1:00 pm #

      My results were wrong returing crazy stuff, turned out I had changed the Keras.json file to use theano but no the ording “tf” to “th”

  5. Geoffrey Anderson August 11, 2016 at 2:42 pm #

    Can anyone with a 8GB memory (or less?) GPU confirm that our Keras-supplied pretrained VGG16 model actually worked to completion on that hardware? Maybe I (we) can save money on a cheaper card than a 12 GB Titan! Thanks if you found that this pretrained model actually worked on your GPU that has less than 12 GB! Please report your GB too. (Actual results only please. not looking for speculation. hope you all understand.)

    • Grant October 2, 2016 at 8:52 pm #

      I just finished this program on my platform and it runs well.
      System configuration:
      Skylake i7-6700, 8G RAM, 500G HD
      ASUS 950GTX (2G Memory)
      Ubuntu 14.04 x64
      Just for your reference.

  6. Zhang Han August 14, 2016 at 10:22 pm #

    Hi! I have a question to ask you. I have a image dataset. But the image size isn’t the same. How to deal with it? I want to train it.Thanks!

    • Adrian Rosebrock August 16, 2016 at 1:08 pm #

      Simply resize your images prior to passing them to the network. You can resize by ignoring the aspect ratio or resize along the smallest dimension and then taking the center crop.

  7. Terry Simons August 19, 2016 at 7:49 pm #

    I tried running the code on a random image from the internet (224×224) but I get messages like this:

    Error allocating 411041792 bytes of device memory (out of memory). Driver report 34959360 bytes free and 1073414144 bytes total

    With a Python traceback that says:

    MemoryError: (‘Error allocating 411041792 bytes of device memory (out of memory).’, “you might consider using ‘theano.shared(…, borrow=True)'”)

    Any ideas?

    I’m still waiting on the official download link, so I don’t have the demo images.

    • Adrian Rosebrock August 22, 2016 at 1:35 pm #

      Please see my replies to Garret and Vineet above.

  8. garret August 19, 2016 at 9:38 pm #

    Hi Adrian,

    Im getting a Memory Error which seems to trigger on line 40 of the code. Do you have any insight as to why this might happen? FYI, Im noty using any GPU features since Im running this on a digialocean droplet. Does that have anything to do with it?


    • Adrian Rosebrock August 22, 2016 at 1:34 pm #

      Deep Learning networks, especially Convolutional Neural Networks, require a lot of RAM. How much memory does your Digital Ocean droplet have?

  9. VINEEt August 21, 2016 at 6:53 pm #

    Hi .. It’s giving me a memory error . I m using windows7 laptop 32 bit. Cud it be due to my laptop configuration or something else.. Kindly guide … And thanks a ton in advance .. Yr tutorial is really very helpful ..

    • Adrian Rosebrock August 22, 2016 at 1:28 pm #

      If you are getting a memory error, then you likely don’t have enough RAM on your machine to load and run the network.

  10. narayan August 24, 2016 at 8:15 am #

    I want to load ImageNet weights and train my 100 category images by using this weight …So can anyone suggest me how i can do this ..?

    • Adrian Rosebrock August 24, 2016 at 12:14 pm #

      This process is called “finetuning”. I’ll be doing a blog post on this concept soon.

  11. Dawer August 28, 2016 at 4:59 am #

    Hi, great post.

    I was able to classify images successfully but how do we control the output of the classification? Like what if I want to go to the base word. Rather than classifying “beagle” how do I tune the ImageNet to output only “dog”? Is there any reference guide for that.

    Also in comments you mentioned freezing lower-layers of the network to classify only part of the ImageNet. how do we do that too?

    • Adrian Rosebrock August 29, 2016 at 1:59 pm #

      Freezing the lower layers of the network and then training the upper layers is called “finetuning”. I can’t explain how to do that in a single blog post, I’ll have to create a separate tutorial for that.

      As for ImageNet, keep in mind that it’s built on the WordNet synsets. Therefore, you can just follow the WordNet hierarchy.

  12. Nasa September 9, 2016 at 11:20 pm #

    Hi Adrian,

    Great post. I just want to ask if this tutorial could be use with raspberry pi? Instead showing pictures taken from camera, I want to use raspberry pi and webcam to classify the image.

    • Adrian Rosebrock September 12, 2016 at 12:55 pm #

      Networks such as VGG and ResNet are too large for the Pi. You could use smaller CNNs for sure — I would highly recommend using SqueezeNet which is actually intended to run on embedded devices.

  13. Wassim El Youssoufi September 25, 2016 at 9:27 am #

    Hi Adrian,
    If I may, I would add that you can encounter issues if your default backend is tensorflow and not theano.
    If you have false predictions, it can be that your code is using the wrong backend.
    To correct that just change the ~/.keras/keras.json to change the “tf” to “th”.

  14. Grant October 2, 2016 at 8:55 pm #

    Hi Adrain,
    Thank you for your great post!
    It took me a looooooong time to try to download the pre-trained data, and python failed several times.
    At last I used a download tool to get all the data files and copy them to the directory.
    To those who might encounter same issue, the directory is:
    Finally I can get the system run smoothly. Thank you!

    • Adrian Rosebrock October 3, 2016 at 7:12 am #

      Thanks for sharing Grant. I know the files are served from GitHub’s CDN which is normally very reliable. Do you have a strong internet connection?

      • Grant October 3, 2016 at 9:40 am #

        Well I can access most website at a fast speed. But I don’t know why the connection between GitHub is very unstable. Maybe because of GFW, I guess…

  15. Alexandru Paiu October 9, 2016 at 12:03 am #

    Hey great post as always!

    In the newest version of keras the models are loaded directly so you don’t have to clone the github repository. You can just do: from keras.applications.resnet50 import ResNet50 Pretty awesome! Also decode predictions now has a top feature that allows you to see top n predicted probabilities.

    • Adrian Rosebrock October 11, 2016 at 1:06 pm #

      Awesome, thanks for sharing this Alexandru! I didn’t realize there was now an applications module. I’ll be sure to play around with this.

  16. Jason October 18, 2016 at 7:27 am #

    I’ve got a bit of a problem, I ran the tutorial at home and everything was as expected however I’ve come into uni and installed , and the images are being misclassified is the beagle is a pug and the rocket is a barrow. Not sure what to make of it… Is it a conflict with the model weights being downloaded automatically now?

    Actually it was a conflict in the keras.Jason file I had ‘tf’ and ‘theano’ oops

    • Adrian Rosebrock October 20, 2016 at 8:56 am #

      Nice job resolving the issue Jason!

  17. abby November 9, 2016 at 7:28 am #

    Hello Adrian ,
    Thank you so much for the amazing tutorials.
    I was wondering if we could use the pre-trained models by Chollet (VGG16, VGG19, and ResNet50) for transfer learning, so that we can fine-tune the models trained on imagenet to work with another dataset?

    • Adrian Rosebrock November 10, 2016 at 8:39 am #

      You absolutely can fine-tune these pre-trained networks. This is a topic I’ll be covering in my next book. More details to come in late-November/early-December.

  18. Walid Ahmed November 14, 2016 at 1:48 pm #

    Thanks a lot Adrian
    I can not wait for your post on Object localization.

  19. Walid Ahmed November 16, 2016 at 3:08 pm #

    Hi Adrian.

    I want to share with you that I think

    1-results from all models are not always the same as you would notice one image classified as a desk by ResNet50 and the as a keyboard by VGG16.

    2-all models are limited by having the identified object consuming most of the space of thee image

    am I right?

    • Adrian Rosebrock November 18, 2016 at 9:04 am #

      Different network architectures that were trained using different optimizers can certainly obtain different results on a per-image basis. What matters is on the aggregate.

      And yes, for this specific type of setup the classification is normally dependent on the object consuming a large portion of the image. However, with that said, we can apply localization methods to find various objects in an image.

  20. Rors November 17, 2016 at 4:54 am #

    Hi Adrian,

    Once you have a trained neural net is it possible to use a webcam to capture video and send those images through the net for classification like with a haar classifier ?


    • Adrian Rosebrock November 18, 2016 at 8:57 am #

      Absolutely. You likely want to “skip frames” and send only ever N-th frame to the NN. But yes, the same techniques still apply. Just access your webcam, read the frame, and pass it to your network.

  21. Nurman November 23, 2016 at 7:29 am #

    I have managed to run the tutorial successfully but when I tried to change the setting to ResNet50, and the run, I got the following error:
    ValueError: CorrMM images and kernel must have the same stack size.

    I have not made any changes to the code apart from changing the VGG to ResNet.

    Do you have any ideas what went wrong?

    • Adrian Rosebrock November 23, 2016 at 8:32 am #

      I’m not sure regarding that error, I have not encountered that before. I would suggest opening an issue on GitHub.

    • Bob Haffner February 17, 2017 at 2:34 pm #


      I believe that’s the error I was getting when I was on Keras 1.0.7 I just updated to 1.2.2 so I could use the built-in Resnet50 model i.e. keras.applications.resnet50 and the error went away.


  22. Dinesh Vadhia November 24, 2016 at 3:21 pm #

    To extract the dense feature vector of an image, the recommendation is to get it from the penultimate layer. But, what is the name of this layer for the respective pre-trained models ie. VGG16, VGG19 and InceptionV3?

    The keras doc has one example at for VGG19 (‘block4_pool’) but I don’t know if this is the penultimate layer. Thanks for the help.

    • Adrian Rosebrock November 28, 2016 at 10:44 am #

      You need to look at the source code for VGG16, VGG19, Inception, etc. Each layer in the respective architecture has a name attribute.

  23. Atti December 1, 2016 at 4:16 am #

    hey Adrian, great post as always.

    i ran into a little problem: “too many values to unpack” at this line
    (inID, label) = decode_predictions(preds)[0]

    which i replaced with
    (inID, label, probability) = decode_predictions(preds)[0][0]

    and it started working

    you might want to take a look at this. Maybe its because i`m usinga never version of Keras 1.1.2

    • Adrian Rosebrock December 1, 2016 at 7:21 am #

      Thanks Atti — I was just about to update the code for this change, thank you for pointing this out.

  24. Walid December 1, 2016 at 12:38 pm #

    Can you please advice how to apply localization?

    • Adrian Rosebrock December 5, 2016 at 1:48 pm #

      I’ll be discussing detection/localization in my upcoming deep learning book (stay tuned).

  25. Joey Sidesmith December 16, 2016 at 4:10 pm #

    Struggling to follow along here…

    ‘img_path = ‘/path/’

    from keras.preprocessing import image
    x = image.load_img(img_path, target_size=(250, 250))

    x = image.img_to_array(x)

    print x.shape

    >> (250, 250, 3)

    x = np.expand_dims(x, axis=0)
    print x.shape

    >>(1, 250, 250, 3)’

    However i’m under the impression my output should be (1, 3, 250, 250)…..


    • Adrian Rosebrock December 18, 2016 at 8:42 am #

      This is entirely dependent on your image_dim_ordering in your ~/.keras/keras.json file. A “tf” value will produce a shape of (h, w, d) while a “th” ordering will be (d, h, w). Be sure to double-check with backend you are using along with which image dimension ordering you are using.

  26. RAFAEL FIGUEROA January 8, 2017 at 12:23 am #

    Hello Adrian, greetings from Brasil 🙂

    Thanks for the model, it´s very instructional.

    I´m manage to change between VGG16 and VGG19, but when I try to load resnet50 it´s says that there is no such model.

    Can you explain how to load it please.

    Thanks !

    • Adrian Rosebrock January 9, 2017 at 9:14 am #

      Which version of Keras are you using? If it’s Keras 1.1 or greater you can just do:

      from keras.applications import ResNet50

  27. Thomas January 12, 2017 at 2:38 am #

    Great post!

    I ran into an error message running it though:

    Seems like decode_predictions(preds)[0] returns a list of five tuples for each of the classifications that has any probability at all.

    Changing to:

    decode_predictions(preds)[0][0] returns the tuple of the classification with largest probability. This is a tuple consisting of three variables, the id, the classification and the probability.

    So if I change to (inID, label, prob) = decode_predictions(preds)[0][0] we can print the probability as well.

    Maybe this is due to some recent changes in the classification index that is downloaded?

    • Adrian Rosebrock January 12, 2017 at 7:53 am #

      Hey Thomas — you are indeed correct. The error is due to an update to Keras. I’ll also update this blog post to reflect the change. Thank you for pointing it out!

  28. Lars January 12, 2017 at 9:36 am #

    First of all: Thanks for this tutorial !!! Now to my problem :
    I tried to predict multiple images in a batch, but I can’t seem to get it to work.
    I tried to make a batch like this :
    image = np.array([np.array(image_utils.load_img(fname, target_size=(224, 224))) for fname in filelist]).
    Or should I just do a for loop and load the images one after another ?

    • Adrian Rosebrock January 13, 2017 at 8:43 am #

      It looks like you’re forgetting to call .img_to_array and preprocess_input on each image. You’ll also need to expand the dimensions of each image. Since that would make for a vey long list comprehension I would suggest using just a simple for loop.

  29. nobit January 12, 2017 at 10:00 am #


    I would like to know the difficulty level to clasify two variants of the same concept. For example, if I already know that what is in the image is a door, to train a network to determine wether the door is open or closed.

    What approach would you recommend me?

    • Adrian Rosebrock January 13, 2017 at 8:41 am #

      Is your camera fixed and non-moving? If so, this is a very easy problem to solve (and you don’t need machine learning, just basic computer vision techniques).

      However, if you’re looking to determine if any given door is open or closed, that is much more challenging and would certainly require a large dataset and likely deep learning techniques.

  30. DJ March 10, 2017 at 12:57 pm # error: the following arguments are required: -i/–image

    How do I fix this

    • Adrian Rosebrock March 10, 2017 at 3:42 pm #

      You need to supply the --image command line argument as I do in the example in the blog post.

      • DJ March 10, 2017 at 4:08 pm #

        Where? What does the syntax look like?

      • DJ March 13, 2017 at 11:25 am #

        odels>python –image dog.jpg

        Traceback (most recent call last):
        File “”, line 2, in
        from keras.preprocessing import image as image_utils
        ModuleNotFoundError: No module named ‘keras’

        I have Keras version 1.2.2 installed and runs in PyCharm. Cmd line does not recognize Keras, as you have shown.

        • Adrian Rosebrock March 13, 2017 at 12:06 pm #

          If you are using a Python virtual environment make sure you have access it before running your Python script. You’ll want to make sure your command line environment matches your PyCharm environment.

  31. Jose Luis Verdugo March 12, 2017 at 8:30 pm #

    Hi everyone, I configured Keras with Theano as backend, following in the footsteps of an earlier Adrian post. Now, I cloned the git mentioned above and created the script, but when I tried to run the script, the following error was returned: “AttributeError: The ‘module’ object has no ‘image_data_format’ “. I tried to add the following row to the keras.json file but it does not work: “image_data_format”: “channels_first”

    Someone knows this issue or have a solution?

    • Max March 15, 2017 at 3:48 pm #

      Hi, Jose.
      I’ve got the same error first. Then I downloaded the ready-made code (red button at the end of blog post) and ran it with success.

  32. amell March 22, 2017 at 7:33 am #

    it is interesting but if I want to classify an object that is not included in IMAGENET. Briefly, I want to recognize automatically a business card but there isn’t a class of business card.

    • Adrian Rosebrock March 22, 2017 at 8:29 am #

      You would want to consider using transfer learning, either via feature extraction or fine-tuning. I’ll be covering both in my upcoming deep learning book.


  1. ImageNet: VGGNet, ResNet, Inception, and Xception with Keras - PyImageSearch - March 20, 2017

    […] few months ago I wrote a tutorial on how to classify images using Convolutional Neural Networks (specifically, VGG16) pre-trained on […]

Leave a Reply