Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Inside this Keras tutorial, you will discover how easy it is to get started with deep learning and Python. You will use the Keras deep learning library to train your first neural network on a custom image dataset, and from there, you’ll implement your first Convolutional Neural Network (CNN) as well.

The inspiration for this guide came from PyImageSearch reader, Igor, who emailed me a few weeks ago and asked:

Hey Adrian, thanks for the PyImageSearch blog. I’ve noticed that nearly every “getting started” guide I come across for Keras and image classification uses either the MNIST or CIFAR-10 datasets which are built into Keras. I just call one of those functions and the data is automatically loaded for me.

But how do I go about using my own image dataset with Keras?

What steps do I have to take?

Igor has a great point — most Keras tutorials you come across will try to teach you the basics of the library using an image classification dataset such MNIST (handwriting recognition) or CIFAR-10 (basic object recognition).

These image datasets are standard benchmarks in the computer vision and deep learning literature, and sure, they will absolutely get you started using Keras…

…but they aren’t necessarily practical in the sense that they don’t teach you how to work with your own set of images residing on disk. Instead, you’re just calling helper functions to load pre-compiled datasets.

I’m going with a different take on an introductory Keras tutorial.

Instead of teaching you how to utilize one of these pre-compiled datasets, I’m going to teach you how to train your first neural network and Convolutional Neural Network using a custom dataset — because let’s face it, your goal is to apply deep learning to your own dataset, not one built into Keras, am I right?

To learn how to get started with Keras, Deep Learning, and Python, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras Tutorial: How to get started with Keras, Deep Learning, and Python

Today’s Keras tutorial is designed with the practitioner in mind — it is meant to be a practitioner’s approach to applied deep learning.

That means that we’ll learn by doing.

We’ll be getting our hands dirty.

Writing some Keras code.

And then training our networks on our custom datasets.

This tutorial is not meant to be a deep dive into the theory surrounding deep learning.

If you’re interested in studying deep learning in odepth, including both (1) hands-on implementations and (2) a discussion of theory, I would suggest you check out my book, Deep Learning for Computer Vision with Python.

Overview of what’s going to be covered

Training your first simple neural network with Keras doesn’t require a lot of code, but we’re going to start slow, taking it step-by-step, ensuring you understand the process of how to train a network on your own custom dataset.

The steps we’ll cover today include:

  1. Installing Keras and other dependencies on your system
  2. Loading your data from disk
  3. Creating your training and testing splits
  4. Defining your Keras model architecture
  5. Compiling your Keras model
  6. Training your model on your training data
  7. Evaluating your model on your test data
  8. Making predictions using your trained Keras model

I’ve also included an additional section on training your first Convolutional Neural Network.

This may seem like a lot of steps, but I promise you, once we start getting into the example you’ll see that the examples are linear, make intuitive sense, and will help you understand the fundamentals of training a neural network with Keras.

Our example dataset

Figure 1: In this Keras tutorial, we won’t be using CIFAR-10 or MNIST for our dataset. Instead, I’ll show you how you can organize your own dataset of images and train a neural network using deep learning with Keras.

Most Keras tutorials you come across for image classification will utilize MNIST or CIFAR-10 — I’m not going to do that here.

To start, MNIST and CIFAR-10 aren’t very exciting examples.

These tutorials don’t actually cover how to work with your own custom image datasets. Instead, they simply call built-in Keras utilities that magically return the MNIST and CIFAR-10 datasets as NumPy arrays. In fact, your training and testing splits have already been pre-split for you!

Secondly, if you want to use your own custom datasets you really don’t know where to start. You’ll find yourself scratching your head and asking questions such as:

  • Where are those helper functions loading the data from?
  • What format should my dataset on disk be?
  • How can I load my dataset into memory?
  • What preprocessing steps do I need to perform?

Let’s be honest — your goal in studying Keras and deep learning isn’t to work with these pre-baked datasets.

Instead, you want to work with your own custom datasets.

And those introductory Keras tutorials you’ve come across only take you so far.

That’s why, inside this Keras tutorial, we’ll be working with a custom dataset called the “Animals dataset” I created for my book, Deep Learning for Computer Vision with Python:

Figure 2: In this Keras tutorial we’ll use an example animals dataset straight from my deep learning book. The dataset consists of dogs, cats, and pandas.

The purpose of this dataset is to correctly classify an image as containing either:

  • Cats
  • Dogs
  • Pandas

Containing only 3,000 images, the Animals dataset is meant to be an introductory dataset that we can quickly train a deep learning model on using either our CPU or GPU (and still obtain reasonable accuracy).

Furthermore, using this custom dataset enables you to understand:

  1. How you should organize your dataset on disk
  2. How to load your images and class labels from disk
  3. How to partition your data into training and testing splits
  4. How to train your first Keras neural network on the training data
  5. How to evaluate your model on the testing data
  6. How you can reuse your trained model on data that is brand new and outside your training and testing splits

By following the steps in this Keras tutorial you’ll be able to swap out my Animals dataset for any dataset of your choice, provided you utilize the project/directory structure detailed below.

Need data? If you need to scrape images from the internet to create a dataset, check out how to do it the easy way with Bing Image Search, or the slightly more involved way with Google Images.

Project structure

There are a number of files associated with this project. Grab the zip from the “Downloads” section and then use the tree  command to show the project structure in your terminal (I’ve provided two command line argument flags to tree  to make the output nice and clean):

As previously discussed, today we’ll be working with the Animals dataset. Notice how animals  is organized in the project tree. Inside of animals/ , there are three class directories: cats/ , dogs/ , panda/ . Within each of those directories is 1,000 images pertaining to the respective class.

If you work with your own dataset, just organize it the same way! Ideally you’ll gather 1,000 images per class at a minimum. This isn’t always possible, but you should at least have class balance. Significantly more images in one class folder could cause model bias.

Next is the images/  directory. This directory contains three images for testing purposes which we’ll use to demonstrate how to (1) load a trained model from disk and then (2) classify an input image that is not part of our original dataset.

The output/  folder contains three types of files which are generated by training:

  • .model : A serialized Keras model file is generated after training and can be used in future inference scripts.
  • .pickle : A serialized label binarizer file. This file contains an object which contains class names. It accompanies a model file.
  • .png : I always place my training/validation plot images in the output folder as it is an output of the training process.

The pyimagesearch/  directory is a module. Contrary to the many questions I receive, pyimagesearch  is not a pip-installable package. Instead it resides in the project folder and classes contained within can be imported into your scripts. It is provided in the “Downloads” section of this Keras tutorial.

Today we’ll be reviewing four .py files:

  • In the first half of the blog post, we’ll train a simple model. The training script is .
  • We’ll advance to training SmallVGGNet  using the  script.
  • The  file contains our SmallVGGNet  class, a Convolutional Neural Network.
  • What good is a serialized model unless we can deploy it? In , I’ve provided sample code for you to load a serialized model + label file and make an inference on an image. The prediction script is only useful after we have successfully trained a model with reasonable accuracy. It is always useful to run this script to test with images that are not contained within the dataset.

1. Install Keras on your system

Figure 3: We’ll use Keras with the TensorFlow backend in this introduction to Keras for deep learning blog post.

For today’s tutorial, you will need to have Keras, TensorFlow, and OpenCV installed.

If you don’t have this software on your system yet, don’t run for the hills! I’ve written a number of easy-to-follow installation guides. I also update them on a regular basis. Here is what you need:

  • OpenCV Installation Guides — This launchpad links to tutorials that will help you install OpenCV on Ubuntu, MacOS, or Raspberry Pi.
  • Install Keras with TensorFlow — You’ll be up and running with Keras and Tensorflow in less than two minutes, thanks to pip. You can install these packages on a Raspberry Pi; however, I advise against training with your Pi. Pre-trained and reasonably sized models (such as both that we’re covering today) can easily run on a Pi, but make sure you train them first!
  • Install imutilscikit-learn, and matplotlib — Be sure to install these packages as well (ideally into your virtual environment). It is easy to install each with pip:

2. Load your data from disk

Figure 4: Step #2 of our Keras tutorial involves loading images from disk into memory.

Now that Keras is installed on our system we can start implementing our first simple neural network training script using Keras. We’ll later implement a full-blown Convolutional Neural Network, but let’s start easy and work our way up.

Open up  and insert the following code:

Lines 2-19 import our required packages. As you can see there are quite a few tools this script is taking advantage of. Let’s review the important ones:

  • matplotlib : This is the go-to plotting package for Python. That said, it does have its nuances, and if you’re having trouble with it, refer to this blog post. On Line 3, we instruct matplotlib  to use the "Agg"  backend enabling us to save plots to disk — that’s your first nuance!
  • sklearn : The scikit-learn  library will help us with binarizing our labels, splitting data for training/testing, and generating a training report in our terminal.
  • keras : You’re reading this tutorial to learn about Keras — it is our high level frontend into TensorFlow and other deep learning backends.
  • imutils : My package of convenience functions. We’ll use the paths  module to generate a list of image file paths for training.
  • numpy : NumPy is for numerical processing with Python. It is another go-to package. If you have OpenCV for Python and scikit-learn installed, then you’ll have NumPy as it is a dependency.
  • cv2 : This is OpenCV. At this point, it is both tradition and a requirement to tack on the 2 even though you’re likely using OpenCV 3 or higher.
  • …the remaining imports are built into your installation of Python!

Wheww! That was a lot, but having a good idea of what each import is used for will aid your understanding as we walk through these scripts.

Let’s parse our command line arguments with argparse:

Our script will dynamically handle additional information provided via the command line when we execute our script. The additional information is in the form of command line arguments. The argparse  module is built into Python and will handle parsing the information you provide in your command string. For additional explanation, refer to this blog post.

We have four command line arguments to parse:

  • --dataset : The path to our dataset of images on disk.
  • --model : Our model will be serialized and output to disk. This argument contains the path to the output model file.
  • --label-bin : Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.
  • --plot : The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data.

With the dataset information in hand, let’s load our images and class labels:

Here we:

  • Initialize lists for our data  and labels  (Lines 35 and 36). These will later become NumPy arrays.
  • Grab imagePaths  and randomly shuffle them (Lines 39-41). The paths.list_images  function conveniently will find all the paths to all input images in our --dataset  directory before we sort and shuffle  them. I set a seed  so that the random reordering is reproducible.
  • Begin looping over all imagePaths  in our dataset (Line 44).

For each imagePath , we proceed to:

  • Load the image  into memory (Line 48).
  • Resize the image to 32x32  pixels (ignoring aspect ratio) as well as flatten  the image (Line 49). It is critical to resize  our images properly because this neural network requires these dimensions. Each neural network will require different dimensions, so just be aware of this. Flattening the data allows us to pass the raw pixel intensities to the input layer neurons easily. You’ll see later that for VGGNet we pass the volume to the network since it is convolutional. Keep in mind that this example is just a simple non-convolutional network — we’ll be looking at a more advanced example later in the post.
  • Append the resized image to data  (Line 50).
  • Extract the class label  of the image from the path (Line 54) and add it to the labels  list (Line 55). The labels  list contains the classes that correspond to each image in the data list.

Now in one fell swoop, we can apply array operations to the data and labels:

On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] (a common preprocessing step).

We also convert the labels  list to a NumPy array (Line 59).

3. Construct your training and testing splits

Figure 5: Before fitting a deep learning or machine learning model you must split your data into training and testing sets. Scikit-learn is employed in this blog post to split our data.

Now that we have loaded our image data from disk, next we need to construct our training and testing splits:

It is typical to allocate a percentage of your data for training and a smaller percentage of your data for testing. The scikit-learn provides a handy train_test_split  function which will split the data for us.

Both trainX  and testX  make up the image data itself while  trainY  and testY  make up the labels.

Our class labels are currently represented as strings; however, Keras will assume that both:

  1. Labels are encoded as integers
  2. And furthermore, one-hot encoding is performed on these labels making each label represented as a vector rather than an integer

To accomplish this encoding, we can use the LabelBinarizer  class from scikit-learn:

On Line 70, we initialize the LabelBinarizer  object.

A call to fit_transform  finds all unique class labels in trainY  and then transforms them into one-hot encoded labels.

A call to just .transform  on testY  performs just the one-hot encoding step — the unique set of possible class labels was already determined by the call to .fit_transform .

Here’s an example:

Notice how only one of the array elements is “hot” which is why we call this “one-hot” encoding.

4. Define your Keras model architecture

Figure 6: Our simple neural network is created using Keras in this deep learning tutorial.

The next step is to define our neural network architecture using Keras. Here we will be using a network with one input layer, two hidden layers, and one output layer:

Since our model is really simple, we go ahead and define it in this script (typically I like to make a separate class in a separate file for the model architecture).

The input layer and first hidden layer are defined on Line 76. will have an input_shape  of 3072  as there are 32x32x3=3072  pixels in a flattened input image. The first hidden layer will have 1024  nodes.

The second hidden layer will have 512  nodes (Line 77).

Finally, the number of nodes in the final output layer (Line 78) will be the number of possible class labels — in this case, the output layer will have three nodes, one for each of our class labels (“cats”, “dogs”, and “panda”, respectively).

5. Compile your Keras model

Figure 7: Step #5 of our Keras tutorial requires that we compile our model with an optimizer and loss function.

Once we have defined our neural network architecture, the next step is to “compile” it:

First, we initialize our learning rate and total number of epochs to train for  (Lines 81 and 82).

Then we compile  our model using the Stochastic Gradient Descent ( SGD ) optimizer with "categorical_crossentropy"  as the loss  function.

Categorical cross-entropy is used as the loss for nearly all networks trained to perform classification. The only exception is for 2-class classification where there are only two possible class labels. In that event you would want to swap out "categorical_crossentropy"  for "binary_crossentropy" .

6. Fit your Keras model to the data

Figure 8: In Step #6 of this Keras tutorial, we train a deep learning model using our training data and compiled model.

Now that our Keras model is compiled, we can “fit” (i.e., train) it on our training data:

We’ve discussed all the inputs except batch_size . The batch_size  controls the size of each group of data to pass through the network. Larger GPUs would be able to accommodate larger batch sizes. I recommend starting with 32  or 64  and going up from there.

7. Evaluate your Keras model

Figure 9: After we fit our model, we can use our testing data to make predictions and generate a classification report.

We’ve trained our actual model but now we need to evaluate it on our testing data.

It’s important that we evaluate on our testing data so we can obtain an unbiased (or as close to unbiased as possible) representation of how well our model is performing with data it has never been trained on.

To evaluate our Keras model we can use a combination of the .predict  method of the model along with the classification_report  from scikit-learn:

When running this script you’ll notice that our Keras neural network will start to train, and once training is complete, we’ll evaluate the network on our testing set:

This network is small, and when combined with a small dataset, takes only 2 seconds per epoch on my CPU.

Here you can see that our network is obtaining 61% accuracy.

Since we would have a 1/3 chance of randomly picking the correct label for a given image we know that our network has actually learned patterns that can be used to discriminate between the three classes.

We also save a plot of our:

  • Training loss
  • Validation loss
  • Training accuracy
  • Validation accuracy

…ensuring that we can easily spot overfitting or underfitting in our results.

Figure 10: Our simple neural network training script (created with Keras) generates an accuracy/loss plot to help us spot under/overfitting.

Looking at our plot we see a small amount of overfitting start to occur past epoch ~45 where our training and validation losses start to diverge and a pronounced gap appears.

Finally, we can save our model to disk so we can reuse it later without having to retrain it:

8. Make predictions on new data using your Keras model

At this point our model is trained — but what if we wanted to make predictions on images after our network has already been trained?

What would we do then?

How would we load the model from disk?

How can we load an image and then preprocess it for classification?

Inside the  script, I’ll show you how, so open it and insert the following code:

First, we’ll import our required packages and modules.

You’ll need to explicitly import load_model  from keras.models  whenever you write a script to load a Keras model from disk. OpenCV will be used for annotation and display. The pickle  module will be used to load our label binarizer.

Next, let’s parse our command line arguments:

  • --image : The path to our input image.
  • --model : Our trained and serialized Keras model path.
  • --label-bin : Path to the serialized label binarizer.
  • --width : The width of the input shape for our CNN. Remember — you can’t just specify anything here. You need to specify the width that the model is designed for.
  • --height : The height of the image input to the CNN. The height specified must also match the network’s input shape.
  • --flatten : Whether or not we should flatten the image. By default, we won’t flatten the image. If you need to flatten the image, you should pass a 1  for this argument.

Next, let’s load the image and resize it based on the command line arguments:

And then we’ll flatten  the image if necessary:

Flattening the image for standard fully-connected networks is straightforward (Lines 33-35).

In the case of a CNN, we also add the batch dimension, but we do not flatten the image (Lines 39-41). An example CNN is covered in the next section.

From there, let’s load the model + label binarizer into memory and make a prediction:

Our model and label binarizer are loaded via Lines 45 and 46.

We can make predictions on the input image  by calling  model.predict  (Line 49).

What does the  preds  array look like?

The 2D array contains (1) the index of the image in the batch (here there is only one index as there was only one image passed into the NN for classification) and (2) percentages corresponding to each class label, as shown by querying the variable in my Python debugger:

  • cats: 54.6%
  • dogs: 45.4%
  • panda: ~0%

In other words, our network “thinks” that it sees “cats” and it sure as hell “knows” that it doesn’t see a “panda”.

Line 53 finds the index of the max value (the 0-th “cats” index).

And Line 54 extracts the “cats” string label from the label binarizer.

Easy right?

Now let’s display the results:

We format our text  string on Line 57. This includes the label  and the prediction value in percentage format.

Then we place the text  on the output  image (Lines 58 and 59).

Finally, we show the output image on the screen and wait until the user presses any key on Lines 62 and 63 (watch Homer Simpson try to locate the “any” key).

Our prediction script was rather straightforward.

Once you’ve used the “Downloads” section of this tutorial to download the code, you can open up a terminal and try running our trained network on custom images:

Be sure that you copy/pasted or typed the entire command (including command line arguments) from within the folder relative to the script. If you’re having trouble with the command line arguments, give this blog post a read.

Figure 11: A cat is correctly classified with a simple neural network in our Keras tutorial.

Here you can see that our simple Keras neural network has classified the input image as “cats” with 55.87% probability, despite the cat’s face being partially obscured by a piece of bread.

9. BONUS: Training your first Convolutional Neural Network with Keras

Admittedly, using a standard feedforward neural network to classify images is not a wise choice.

Instead, we should leverage Convolutional Neural Networks (CNNs) which are designed to operate over the raw pixel intensities of images and learn discriminating filters that can be used to classify images with high accuracy.

The model we’ll be discussing here today is a smaller variant of VGGNet which I have named “SmallVGGNet”.

VGGNet-like models share two common characteristics:

  1. Only 3×3 convolutions are used
  2. Convolution layers are stacked on top of each other deeper in the network architecture prior to applying a destructive pooling operation

Let’s go ahead and implement SmallVGGNet now.

Open up the  file and insert the following code:

As you can see from the imports on Lines 2-10, everything needed for the SmallVGGNet  comes from keras . I encourage you to familiarize yourself with each in the Keras documentation and in my deep learning book.

We then begin to define our SmallVGGNet  class and the build  method:

Our class is defined on Line 12 and the sole build  method is defined on Line 14.

Four parameters are required for build : the width of the input images, the height of the  height input images, the  depth , and number of classes .

The depth  can also be thought of as the number of channels. Our images are in the RGB color space, so we’ll pass a depth  of 3  when we call the build  method.

First, we initialize a Sequential  model (Line 17).

Then, we determine channel ordering. Keras supports "channels_last"  (i.e. TensorFlow) and "channels_first"  (i.e. Theano) ordering. Lines 18-25 allow our model to support either type of backend.

Now, let’s add some layers to the network:

Our first CONV => RELU => POOL  layers are added by this block.

Our first CONV  layer has 32  filters of size 3x3 .

It is very important that we specify the inputShape  for the first layer as all subsequent layer dimensions will be calculated using a trickle-down approach.

We’ll use the ReLU (Rectified Linear Unit) activation function in this network architecture. There are a number of activation methods and I encourage you to familiarize yourself with the popular ones inside Deep Learning for Computer Vision with Python where pros/cons and tradeoffs are discussed.

Batch Normalization, MaxPooling, and Dropout are also applied.

Batch Normalization is used to normalize the activations of a given input volume before passing it to the next layer in the network. It has been proven to be very effective at reducing the number of epochs required to train a CNN as well as stabilizing training itself.

POOL layers have a primary function of progressively reducing the spatial size (i.e. width and height) of the input volume to a layer. It is common to insert POOL layers between consecutive CONV layers in a CNN architecture.

Dropout is an interesting concept not to be overlooked. In an effort to force the network to be more robust we can apply dropout, the process of disconnecting random neurons between layers. This process is proven to reduce overfitting, increase accuracy, and allow our network to generalize better for unfamiliar images. As denoted by the parameter, 25% of the node connections are randomly disconnected (dropped out) between layers during each training iteration.

Note: If you’re new to deep learning, this may all sound like a different language to you. Just like learning a new spoken language, it takes time, study, and practice. If you’re yearning to learn the language of deep learning, why not grab my highly rated book, Deep Learning for Computer Vision with Python? I promise that I break down these concepts in the book and reinforce them via practical examples.

Moving on, we reach our next block of (CONV => RELU) * 2 => POOL  layers:

Notice that our filter dimensions remain the same ( 3x3 , which is common for VGG-like networks); however, we’re increasing the total number of filters learned from 32 to 64.

This is followed by a (CONV => RELU => POOL) * 3  layer set:

Again, notice how all CONV layers learn 3x3  filters but the total number of filters learned by the CONV layers has doubled from 64 to 128. Increasing the total number of filters learned the deeper you go into a CNN (and as your input volume size becomes smaller and smaller) is common practice.

And finally we have a set of FC => RELU  layers:

Fully connected layers are denoted by Dense  in Keras. The final layer is fully connected with three outputs (since we have three classes  in our dataset). The softmax  layer returns the class probabilities for each label.

Now that SmallVGGNet  is implemented, let’s write the driver script that will be used to train it on our Animals dataset.

Much of the code here will be similar to the previous example, but I’ll:

  1. Review the entire script as a matter of completeness
  2. And call out any differences along the way

Open up the  script and let’s get started:

The imports are the same as our previous training script with two exceptions:

  1. Instead of from keras.models import Sequential ,  this time we import SmallVGGNet via
    from pyimagesearch.smallvggnet import SmallVGGNet . Scroll up slightly to see the SmallVGGNet implementation.
  2. We will be augmenting our data with ImageDataGenerator . Data augmentation is almost always recommended and leads to models that generalize better. Data augmentation involves adding applying random rotations, shifts, shears, and scaling to existing training data. You won’t see a bunch of new .png and .jpg files — it is done on the fly as the script executes.

You should recognize the other imports at this point. If not, just refer to the bulleted list above.

Let’s parse our command line arguments:

We have four command line arguments to parse:

  • --dataset : The path to our dataset of images on disk. This can be the path to animals/  or another dataset organized the same way.
  • --model : Our model will be serialized and output to disk. This argument contains the path to the output model file. Be sure to name your model accordingly so you don’t overwrite any previously trained models (such as the simple neural network one).
  • --label-bin : Dataset labels are serialized to disk for easy recall in other scripts. This is the path to the output label binarizer file.
  • --plot : The path to the output training plot image file. We’ll review this plot to check for over/underfitting of our data. Each time you train your model with changes to parameters, you should specify a different plot filename in the command line so that you’ll have a history of plots corresponding to training notes in your notebook or notes file. This tutorial makes deep learning seem easy, but keep in mind that I went through several iterations of training before I settled on all parameters to share with you in this script.

Let’s load and preprocess our data:

Exactly as in the simple neural network script, here we:

  • Initialize lists for our data  and labels  (Lines 35 and 36).
  • Grab imagePaths  and randomly shuffle  them (Lines 39-41). The paths.list_images  function conveniently will find all images in our input dataset directory before we sort and shuffle  them.
  • Begin looping over all imagePaths  in our dataset (Line 44).

As we loop over each imagePath , we proceed to:

  • Load the image  into memory (Line 48).
  • Resize the image to 64x64 , the required input spatial dimensions of SmallVGGNet  (Line 49). One key difference is that we are not flattening our data for neural network, because it is convolutional.
  • Append the resized image  to data  (Line 50).
  • Extract the class label  of the image from the imagePath  and add it to the labels  list (Lines 54 and 55).

On Line 58 we scale pixel intensities from the range [0, 255] to [0, 1] in array form.

We also convert the labels  list to a NumPy array format (Line 59).

Then we’ll split our data and binarize our labels:

We perform a 75/25 training and testing split on the data (Lines 63 and 64). An experiment I would encourage you to try is to change the training split to 80/20 and see if the results change significantly.

Label binarizing takes place on Lines 70-72. This allows for one-hot encoding as well as serializing our label binarizer to a pickle file later in the script.

Now comes the data augmentation:

On Lines 75-77, we initialize our image data generator to perform image augmentation.

Image augmentation allows us to construct “additional” training data from our existing training data by randomly rotating, shifting, shearing, zooming, and flipping.

Data augmentation is often a critical step to:

  1. Avoiding overfitting
  2. Ensuring your model generalizes well

I recommend that you always perform data augmentation unless you have an explicit reason not to.

To build our SmallVGGNet , we simply call  while passing the necessary parameters (Lines 80 and 81).

Let’s compile and train our model:

First, we establish our learning rate, number of epochs, and batch size (Lines 85-87).

Then we initialize our Stochastic Gradient Descent (SGD) optimizer (Line 92).

We’re now ready to compile and train our model (Lines 93-99). Since we’re performing data augmentation, we call model.fit_generator  (instead of ). We must pass the generator with our training data as the first parameter. The generator will produce batches of augmented training data according to the settings we previously made.

Finally, we’ll evaluate our model, plot the loss/accuracy curves, and save the model:

We make predictions on the testing set, and then scikit-learn is employed to calculate and print our classification_report  (Lines 103-105).

Matplotlib is utilized for plotting the loss/accuracy curves — Lines 108-118 demonstrate my typical plot setup. Line 119 saves the figure to disk.

Finally, we save our model and label binarizer to disk (Lines 123-126).

Let’s go ahead and train our model.

Make sure you’ve used the “Downloads” section of this blog post to download the source code and the example dataset.

From there, open up a terminal and execute the following command:

When you paste the command, ensure that you have all the command line arguments to avoid a “usage” error. If you are new to command line arguments, make sure you read about them before continuing.

Training on a CPU will take some time — each of the 75 epochs requires over one minute. Training will take well over an hour.

A GPU will finish the process in a matter of minutes as each epoch requires only 2sec, as demonstrated!

Let’s take a look at the resulting training plot that is in the output/  directory:

Figure 12: Our deep learning with Keras accuracy/loss plot demonstrates that we have obtained 78% accuracy on our animals data with our SmallVGGNet model.

As our results demonstrate, you can see that we are achieving 78% accuracy on our Animals dataset using a Convolutional Neural Network, significantly higher than the previous accuracy of 61% using a standard fully-connected network.

We can also apply our newly trained Keras CNN to example images:

Figure 13: Our deep learning with Keras tutorial has demonstrated how we can confidently recognize pandas in images.

Our CNN is very confident that this a “panda”. I am too, but I just wish he would stop staring at me!

Let’s try a cute little beagle:

Figure 14: A beagle is recognized as a dog using Keras, TensorFlow, and Python. Our Keras tutorial has introduced the basics for deep learning, but has just scratched the surface of the field.

A couple beagles have been part of my family and childhood. I’m glad that this beagle picture I found online is recognized as a dog!

I could use a similar CNN to find dog photos of my beagles on my computer.

In fact, in Google Photos, if you type “dog” in the search box, pictures of dogs in your photo library will be returned — I’m pretty sure a CNN has been used for that image search engine feature. Image search engines aren’t the only use case for CNNs — I bet your mind is starting to come up with all sorts of ideas upon which to apply deep learning.

Frustrated with your progress in deep learning?

You can develop your first neural network in minutes…with just a few lines of Python as I demonstrated today.

But designing more advanced networks and tuning training parameters takes studying, time, and practice. Many people find tutorials online that work, but when they try to train their own models, they are left struggling.

Discover and learn deep learning the right way in my book: Deep Learning for Computer Vision with Python.

Inside the book, you’ll find self-study tutorials and end-to-end projects on topics like:

  • Convolutional Neural Networks
  • Object Detection via Faster R-CNNs and SSDs
  • Generative Adversarial Networks (GANs)
  • Emotion/Facial Expression Recognition
  • Best practices, tips, and rules of thumb
  • …and much more!

Using this book you’ll finally be able to bring deep learning to your own projects.

Skip the academics and get to the results.

Click here to learn more.


In today’s tutorial, you learned how to get started with Keras, Deep Learning, and Python.

Specifically, you learned the seven key steps to working with Keras and your own custom datasets:

  1. How to load your data from disk
  2. How to create your training and testing splits
  3. How to define your Keras model architecture
  4. How to compile and prepare your Keras model
  5. How to train your model on your training data
  6. How to evaluate your model on testing data
  7. How to make predictions using your trained Keras model

From there you also learned how to implement a Convolutional Neural Network, enabling you to obtain higher accuracy than a standard fully-connected network.

If you have any questions regarding Keras be sure to leave a comment — I’ll do my best to answer.

And to be notified when future Keras and deep learning posts are published here on PyImageSearch, be sure to enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

156 Responses to Keras Tutorial: How to get started with Keras, Deep Learning, and Python

  1. mohamed September 10, 2018 at 1:54 pm #

    Great: Adrian!
    always forward
    Thank you and thank you Igor

    I have a suggestion as to how to apply some basic concepts of deep learning.
    About how to write those equations in Python.
    Many people know the concepts but there is a barrier between them and the application.

    • Adrian Rosebrock September 10, 2018 at 5:24 pm #

      Hey Mohamed — is there a particular algorithm/equation that you’re struggling with? Or are you speaking in more general terms? If you’re speaking more generically, then Deep Learning for Computer Vision with Python covers the basic concepts of both machine learning and deep learning, including some basic equations and theory before moving into actual applications and code.

      • mohamed September 10, 2018 at 6:00 pm #

        Thanks Adrian Yes, there are certain things that are facing me. But anyway I meant in general

        The book is really wonderful I will work on getting the rest of the versions of it.

  2. Newman September 10, 2018 at 1:58 pm #

    Not Working, here, different numbers on training, and a lot of wrong detection.

    precision recall f1-score support

    cats 0.46 0.66 0.54 244
    dogs 0.49 0.22 0.30 242
    panda 0.69 0.78 0.73 264

    avg / total 0.55 0.56 0.53 750

    second method:
    precision recall f1-score support

    cats 0.66 0.77 0.71 244
    dogs 0.76 0.55 0.63 242
    panda 0.85 0.95 0.90 264

    avg / total 0.76 0.76 0.75 750

    everything seems fine but not the results.

    • Adrian Rosebrock September 10, 2018 at 5:23 pm #

      Could you share which version of Keras and TensorFlow (assuming a TF backend) you are running? Secondly, keep in mind that NNs are stochastic algorithms — there will naturally be variations in results and you should not expect your results to 100% match mine. The effects are random weight initializations are even more pronounced due the fact that we’re working with such a small dataset.

  3. Enrique September 11, 2018 at 3:44 am #

    Hi Adrian:
    I test an image like this, however the result shown is “Panda 100%”. Why happend this?

    • Adrian Rosebrock September 11, 2018 at 8:04 am #

      Pandas are largely white and black and the dog itself is dark brown and white. The network could have overfit to the panda class. The example used here is just that — an example. It’s not meant to be a model that can correctly classify each image class 100% of the time. For that, you will need more data and more advanced techniques. I would encourage you to take a look at Deep Learning for Computer Vision with Python for more information.

  4. Aiden Ralph September 11, 2018 at 3:46 am #

    Brilliant post Adrian!

    • Adrian Rosebrock September 11, 2018 at 8:02 am #

      Thanks Aiden, I’m glad you liked it!

  5. Aline September 11, 2018 at 4:29 pm #

    Amazing tutorial! So clear!

    • Adrian Rosebrock September 12, 2018 at 2:10 pm #

      Thanks so much, Aline! I’m glad you found it helpful 🙂

  6. Reed Guo September 12, 2018 at 12:40 am #

    Hi, Adrian

    Excellent post.

    Can we improve its accuracy?

    • Adrian Rosebrock September 12, 2018 at 2:03 pm #

      You could improve the accuracy by:

      1. Using more images
      2. Applying transfer learning

      Models trained on ImageNet already have a panda, dog, and cat class as well so you could even use an out-of-the-box classifier.

  7. Viktor September 12, 2018 at 3:49 am #

    Hello! How can I download the Animals dataset?

    • Adrian Rosebrock September 12, 2018 at 1:58 pm #

      Just use the “Downloads” section of the blog post and you will be able to download the code and “Animals” dataset.

  8. Hamid September 12, 2018 at 5:59 am #

    Nice one Adrian ! I really appreciate it .

    Such a wonderful post with elegant and simple explanation .

    I wonder if increasing the no.of hidden layers and making dropout to 0.5 would further increase the accuracy from 78

    • Adrian Rosebrock September 12, 2018 at 1:55 pm #

      It may as those are hyperparameters. Give it a try and see!

  9. Marcelo Mota September 12, 2018 at 2:06 pm #

    My friend, this is the best tutorial so far I have ever seen!! thank you so much.

    I am struggling just in one point: I have a binary problem and have to use the to_categorical function from keras. As I am seeing, I can just use it with integers categories, not strings. Is this true?

    And how do I use the pickle file to write this integer binary categories (from to_categorical) and also how do I use it in the classification_report (the code uses the “lb”)?

    Thank you again and congratulations for such good and complete explanations!

    • Adrian Rosebrock September 12, 2018 at 2:14 pm #

      Thanks Marcelo, I’m glad you found the tutorial helpful!

      For a binary problem you should use the LabelEncoder class instead of LabelBinarizer. The LabelBinarizer will integer-encode the labels which you can then pass into to_categorical to obtain the vector representation of the labels.

      The LabelEncoder can be serialized to disk and convert labels just like the LabelBinarizer does.

  10. Mutlucan Tokat September 12, 2018 at 2:29 pm #

    Hi Adrian,
    Range of the pixels are same. Every pixel gets values between 0-255. Why we need to scale it between 0 and 1 ?

    • Adrian Rosebrock September 14, 2018 at 9:53 am #

      Most (but not all) neural networks expect inputs to be in the range [0, 1]. You may see other scaling and normalization techniques, such as mean subtraction, but those are more advanced methods. Your goal here is to bring every input to a common range. Exactly which range you use may depend on:

      1. Weight initialization techniqu
      2. Activation technique
      3. Whether or not you are performing mean subtraction

      In general, scaling to [0, 1] gives your variables less “freedom” and less likely of causing gradient or overflow errors if you keep larger value ranges (such as [0, 255]). [0, 1] scaling is typically your “first” scaling technique that you’ll see.

  11. Bob de Graaf September 13, 2018 at 9:28 am #

    Hi Adrian,

    Great tutorial as always! I’m wondering though, isn’t this almost the same tutorial as the Pokemon one? The where you classify different 5 pokemons in images? The code seems mostly the same 🙂

    I do see some small differences though, for example in the Pokemon on you use the Adam optimizer instead of SGD, and the initial learning rate is 0.001 instead of 0.01.

    Are these changes things you’ve learned these past months to achieve better results? Or were these randomly chosen?

    • Adrian Rosebrock September 14, 2018 at 9:33 am #

      The code is similar but not the same. This tutorial is meant to be significantly more introductory and is intended for readers with little to no experience with Keras and deep learning.

      The parameters were also not randomly chosen. They were determined via experiments to find the optimal values for this particular post.

      • Bob de Graaf September 15, 2018 at 8:42 am #

        Ah ok, good to know, thanks! I wasn’t trying to be offensive or anything, just curious. Apologies if I came across that way!

        • Adrian Rosebrock September 17, 2018 at 2:31 pm #

          You certainly were not being offensive, Bob. I just wanted to clarify, that’s all 🙂 Have a great day, friend!

  12. Mutlu September 15, 2018 at 7:39 am #

    Hi Adrian,

    What is chanDim = -1 and chanDim = 1 in beginning of the smallVGGNET?

    Great tutorial BTW.

    • Adrian Rosebrock September 17, 2018 at 2:51 pm #

      It’s the dimension of the channel. For channels-first ordering (ex. Thenano) the channels are ordered first but with channels-last ordering (like TensorFlow) the channels are ordered last — a “-1” value when indexing with Python means the “last dimension/value”.

  13. Roshan September 15, 2018 at 4:27 pm #

    Hi Adrain,
    Thank you for the excellent tutorial.
    I have a basic question:
    During validation, we are considering train, test split as 75% and 25%respectively.
    So while testing, the network randomly picks 25% of images.
    But if I want to find out which images are used for testing, how can I find out?
    I want to know the names of the images used for testing.
    Please help me

    • Adrian Rosebrock September 17, 2018 at 2:25 pm #

      The names of the images won’t be returned by scikit-learn. Instead, if you want the exact image names I would suggest you split your image paths rather than the raw images/labels. That will enable you to still perform the split and have the image paths.

  14. andreas September 18, 2018 at 7:47 pm #

    Hi Adrian,
    This was an excellent tutorial, very well presented and clear. I have a question, how would I add the bounding box using either nms or my own algorithm to show boxes around a image, like what is done in face detection?

    • Adrian Rosebrock October 8, 2018 at 1:35 pm #

      We are performing image classification in this post. What you are looking to perform is called object detection. I would suggest you read this tutorial to get you started.

  15. merly September 20, 2018 at 9:09 am #

    never seen a simple and better tutorial..

    • Adrian Rosebrock October 8, 2018 at 1:14 pm #

      Thank you for the kind words Merly 🙂 Congratulations on getting your start with Keras!

  16. merly September 21, 2018 at 3:08 am #

    UserWarning: Trying to unpickle estimator LabelBinarizer from version 0.19.1 when using version 0.19.2. This might lead to breaking code or invalid results. Use at your own risk.

    I am getting this error ..what should i do?

    • Adrian Rosebrock October 8, 2018 at 1:10 pm #

      Hey there, it’s not an error, it’s a warning. I would suggest you train the model first before you try to run it and make predictions.

  17. Hashir September 22, 2018 at 3:42 am #

    hi adrien,
    This blog was awsome . i really appreciate you for this great effort and am your big fan.
    After reading this keras + tf tutorial i understood lots of things. But i have to initialize my model or weights manually by using my own method like random initialization. so what step should i do in order to initialize this model manually…

    Thanks in advance

    • Adrian Rosebrock October 8, 2018 at 1:06 pm #

      The model weights are automatically initialized during the call to .compile. You can change the initialization method by choosing one of the Keras initializers.

  18. Salman Sajid September 22, 2018 at 4:51 am #

    Thanx Adrian
    Can we use this technique for activity reconization or this technique is only for static object detection

    • Adrian Rosebrock October 8, 2018 at 1:04 pm #

      The method covered here is only for image classification, not activity recognition or object detection.

  19. Wilf September 25, 2018 at 5:44 am #

    Trained using keras 2.2.2 and tensorflow 1.10.0

    Prediction for both simple_nn and smallvggnet failed on the dog.jpg image.

    My question: How do you analyze/understand what went wrong? Is it overfitting, too few training images, poor training image selection, overfitting or something else?

    • Adrian Rosebrock October 8, 2018 at 12:45 pm #

      In order to help get everyone up and running with Keras and deep learning we used a very small dataset for this example. Typically, we would have at least 1,000 images per class. Our network is also far from perfect. We can increase the accuracy of our model by introducing regularization methods, such as L2 weighting, additional data augmentation, etc. If you’re interested in learning more about overfitting/underfitting, including how to detect them, I would suggest you read through Deep Learning for Computer Vision with Python.

  20. Mattia September 27, 2018 at 6:37 am #

    Hi Adrian,
    After many issues with installing opencv, finally i got started with opencv.
    I was trying this tutorial and when i launch the program with this command

    python –dataset animals –model output/smallvggnet.model \
    –label-bin output/smallvggnet_lb.pickle \
    –plot output/smallvggnet_plot.png

    this is the result:
    > /home/luca/Scrivania/keras-tutorial/
    -> model = Sequential()

    And it doesn’ t move on,
    what should i do?

    • Adrian Rosebrock October 8, 2018 at 12:32 pm #

      How are you trying to execute the script? Via the command line?

      • Kirill October 10, 2018 at 4:51 pm #

        Got same problem. Run via command line (using fish, virtualenv, python 3.6.5, mac)

        • Adrian Rosebrock October 12, 2018 at 9:10 am #

          Does bash produce the same error as fish?

    • inf111 October 23, 2018 at 3:07 pm #

      just execute “continue” command

  21. Hélder Ribeiro September 27, 2018 at 7:07 am #

    Hhi adrian,
    so using the vgg train as you describe everything goes smoothly and i’m getting the 70% plus accuracy, but when i try to predict something using the I’m always getting the panda prediction.
    After doing some research I think it might be something related to the preprocessing of images??
    But i’m not sure. One thing is clear when I add:
    image = image.astype(‘float32’)
    image = image/255
    after the image read I start to have some better results, but not sure if this is the way, can you help me?

    • Adrian Rosebrock October 8, 2018 at 12:32 pm #

      It sounds like the network is overfitting to the “panda” class. One method to increase accuracy would be to introduce more regularization, including additional data augmentation.

  22. Stonez October 2, 2018 at 4:56 am #

    Thanks for the great tutorial! Can I add more classes into the file structure, say, adding cow class?
    There should be no changes in the code for this code to recognize dogs, cats, pandas, and cow, correct?



    • Adrian Rosebrock October 8, 2018 at 10:36 am #

      As long as you follow my directory structure for the project and add a directory named “cow” with “cow” images to the dataset directory then yes, no code changes are required.

  23. Balaji October 3, 2018 at 3:41 pm #

    I am getting following error when i try run predict script.

    line 294, in from_config
    model = cls(name=name)
    UnboundLocalError: local variable ‘name’ referenced before assignment

    • Adrian Rosebrock October 8, 2018 at 10:20 am #

      Hi Balaji, could you clarify which version of Keras and TensorFlow you are using? Additionally, did you train your model before trying to run the prediction script?

      • Alan October 8, 2018 at 11:42 am #

        Hi Adrian.
        I am with the same problem using tensorflow 1.5.0 (because my computer does not support AVX instructions) and Keras 2.2.3, and tensorflow 1.10.0 and Keras 2.2.3 in other machine.
        I am trying:
        python –image images/cat.jpg –model output/simple_nn.model \
        –label-bin output/simple_nn_lb.pickle –width 32 –height 32 –flatten 1

        python –image images/panda.jpg –model output/smallvggnet.model \
        –label-bin output/smallvggnet_lb.pickle –width 64 –height 64

        python –image images/dog.jpg –model output/smallvggnet.model \
        –label-bin output/smallvggnet_lb.pickle –width 64 –height 64

        Do I need train the model? I have download your files and I am trying execute them without train.


        • Adrian Rosebrock October 8, 2018 at 1:42 pm #

          Yes, make sure you train the model before you try to make predictions on images.

  24. Niranjan A October 4, 2018 at 11:54 pm #

    Hello Adrian,

    Every article that i check out on pyImageSearch always leaves me impressed. Great work.

    I noticed that in every tutorial you use “argparse”. I wish to know if it makes any difference if we directly load our image into a variable directly instead of using argparse. If so, can you let me know what the difference is?


    • Adrian Rosebrock October 8, 2018 at 9:55 am #

      Hey Niranjan — I think your confusion can be resolved by reading this guide on how argparse works. As you’ll find out, argparse just allows us to supply arguments via the command line instead of manually hardcoding them 🙂

  25. jacob October 5, 2018 at 11:53 am #

    hi adrian,
    can you give me an example of a path that can be add in help”……” ?
    Because when I start the train simple example it arrives to “[INFO] loading images…”
    and then it doesn’t go on!
    Ah , thanks for these amazing tutorials!!!!!!

    • Adrian Rosebrock October 8, 2018 at 9:53 am #

      Hey Jacob, I think your confusion is related to how command line arguments work. Make sure you read this tutorial to help you clear up your confusion.

  26. 北凉徐凤年 October 12, 2018 at 5:58 am #

    hi adrian,
    I dont understand when training there is 70/70 [==================]
    Where 70 come from? What does 70 mean? 70 pictures every epoch?
    Thanks for your articles!

    • Adrian Rosebrock October 12, 2018 at 8:46 am #

      That is actually the number of batches per epoch. There are 70 batches of images per epoch.

      • YewBoon April 15, 2019 at 5:10 am #

        Hi Adrian,

        Could you please explain further about how the 70 is calculated?
        I assume is from this line of code, right?

        model.fit_generator(aug.flow(trainX, trainY, batch_size=BS), validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS, epochs=EPOCHS)

        • Adrian Rosebrock April 18, 2019 at 7:15 am #

          What specifically in Line 70 are you asking how is calculated? The steps_per_epoch value?

  27. Jorge October 17, 2018 at 9:44 pm #

    Hello Adrian. Thank you very much for this excellent tutorial. I have a little question. Could you please describe what would be the architecture in keras code of the convolutional network if the dataset only had two categories? for example cats and dogs (eliminating the pandas folder) and always in the RGB color space. If you could make a parallel with the code of this tutorial for the three categories. I know it is a question that may be basic but it would help me to understand the architecture of the cnn in keras that I still do not have very clear. Thank you very much for your excellent work and congratulations for your wedding. Jorge from Argentina.

    • Adrian Rosebrock October 20, 2018 at 7:52 am #

      The architecture itself would not change except for the final FC layer where there will be two nodes rather than three nodes. Other than that, there will be no other changes to the architecture itself. If you want to train a network for binary classification just make sure you use “binary_crossentropy” for your loss.

  28. Yuthika Shekhar October 21, 2018 at 1:56 pm #

    Hi Adrian,

    Thanks for the amazing explanation. I have few doubts.

    If we try implementing with another dataset, is it supposed to be organized in the way you have organized?

    And also, what does 32x32x3=3072 pixels in a flattened input image mean I am not able to understand the multiplication of 3?

    • Adrian Rosebrock October 22, 2018 at 7:57 am #

      I would recommend you use the same directory structure that I use in the blog post. It will ensure that the code doesn’t have to be changed at all and you can just run the script to train on your own custom dataset.

      As for your second question, images are represented as a 3 channel RGB image. Thus, for a 32×32 RGB image there are a total of 32x32x3=3072 values.

  29. Nick October 21, 2018 at 11:56 pm #

    Hi, this tutorial is self-explanatory I have just started learning machine learning and this image recognition sounds really interesting and cool. I have downloaded all the required files and code from your site. I have Spyder install on Anaconda. I want to run these files. I need help in how can i start integrating these scripts.
    How can i run this model on Spyder?

    Thank You.

    • Adrian Rosebrock October 22, 2018 at 7:52 am #

      Hey Nick, you can certainly use an IDE if you would like but I don’t recommend if you are new to computer vision and deep learning. Take the time to invest in your ability to execute the scripts via the command line. We use the command line quite a bit so become comfortable with it now. Additionally, while I don’t use the Spyder IDE you can use this tutorial on how to use an IDE with Python.

  30. Megan October 23, 2018 at 7:23 pm #

    In Section 9, how do you choose 512 in the model.add(Dense(512)) line 60 of code after you’re done the CONV –> RELU steps?

    • Adrian Rosebrock October 29, 2018 at 2:07 pm #

      It’s a hyperparameter to the model architecture. You run experiments to tune the hyperparameters of the network. I discuss my best practices, tips, and suggestions to hyperparameter tuning inside my book, Deep Learning for Computer Vision with Python.

  31. Farshad October 28, 2018 at 3:26 am #

    Hi Adrian. Thanks for nice explanation. Is there any way to create a CNN model from scratch for object detection or object localization using keras? Can keras do it at all? I searched many posts in websites and all of them used keras for image classification only. If yes, I hope you publish a blog post tutorial about object detection by keras. Thanks for your amazing works.

  32. Juanlu October 28, 2018 at 5:19 pm #

    Great post, but there is one thing missing which is making the predictions fail.
    The same way we divide the inputs by 255.0, we need to do the same thing on the predictor before providing image as input on the NN.

    • Adrian Rosebrock October 30, 2018 at 6:25 am #

      Thanks so much for pointing this out, Juanlu! It was a typo on my part. I have fixed the typo as well as the code download so the issue no longer exists.

  33. andreas November 13, 2018 at 10:32 pm #

    Hi Adrian,

    How do we add the object detection bounding boxes to the images?

  34. andreas November 15, 2018 at 4:41 am #

    Hi Adrian,

    I get this error..any suggestions?

    (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize

    • Adrian Rosebrock November 15, 2018 at 11:52 am #

      Double-check the path to your input dataset. Your path is likely incorrect and the cv2.imread function is returning “None”.

  35. Ctibor November 24, 2018 at 6:08 pm #

    Hi Adrian.
    Thank you for your excellent tutorial. But it’s just for pictures. How could cnn be used to recognize sounds?

    • Adrian Rosebrock November 25, 2018 at 8:55 am #

      Sorry, I don’t have any experience with deep learning for audio applications. I only work with computer vision here. Sorry I couldn’t be of more help!

  36. Zachary Miller November 28, 2018 at 3:21 pm #

    This is by far the most simple to understand and useful tutorial on Keras that I have ever seen. You do a great job of explaining BOTH the concepts behind how the neural network works and what the different functions in the libraries are doing for us (the last part is often left out). Thank you so much!

    • Adrian Rosebrock November 30, 2018 at 9:05 am #

      Thank you so much for the kind words, Zachary — I really appreciate that 🙂

  37. moh December 14, 2018 at 10:10 am #

    Hi, Adrian
    Excellent work
    If I have 1 channel images (ex medical images) and I wanted to apply this program to classify them, what should I change in this program especially the input_shape ?

    • Adrian Rosebrock December 18, 2018 at 9:22 am #

      First, convert your images to grayscale when you load them:

      image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

      Secondly, change your depth=1 when initializing SmallVGGNet.

  38. David December 20, 2018 at 8:45 am #

    please send me source code of this post..

    • Adrian Rosebrock December 27, 2018 at 11:09 am #

      You can use the “Downloads” section of the post to download the source code.

  39. Akshay January 8, 2019 at 8:03 am #

    where can I find the dataset for this tutorial?

    • Adrian Rosebrock January 11, 2019 at 10:05 am #

      You can use the “Downloads” section of this post to download the source code and dataset.

  40. Jarvis January 8, 2019 at 4:54 pm #

    Hi Adrian
    Thank you for this tutorial.
    I have two doubts :
    1. At the data =np.array(data, dtype=”float”)
    i am getting a sequence error , in short I am not able to convert the data dtype(‘O’) into float.
    I got rid of this error by copying it into another float array. But after trying for many hours I could’nt solve this error.
    2. I am getting loss =NaN
    I have checked my input data and I am sure that none of the input values have nan value.

    Any help will be appreciated.
    thank you

    • Adrian Rosebrock January 11, 2019 at 9:59 am #

      Are you using the exact code and datasets from this tutorial? Or are you working with your own custom dataset?

      • Jarvis January 20, 2019 at 6:10 pm #

        Hi thanks Adrian for the post again.
        I figured the errors myself. I was using a custom dataset and some of the images were corrupted due to which i was getting these errors.

        • Adrian Rosebrock January 22, 2019 at 9:29 am #

          Congrats on resolving the issue!

  41. AHMED ARUP KAMAL January 21, 2019 at 9:53 am #

    Hi Adrian,

    I managed to run it. But after 1200 epoch and 0.01 learning rate, my training accuracy is ~1.0 and validation accuracy is ~0.50!
    What’s happening??!!

    • Adrian Rosebrock January 22, 2019 at 9:19 am #

      Your network is overfitting to the data. Training for longer isn’t necessarily going to give you better accuracy. Instead, you need to learn how to properly set the hyperparameters of the network. To improve your accuracy and learn my tips, suggestions, and best practices to improve the accuracy of your networks, make sure you refer to Deep Learning for Computer Vision with Python.

  42. Saptarshi January 31, 2019 at 1:41 pm #

    Hey! Loved the tutorial.
    Can the same VGG network be used for a hand gesture recognition system for classifying gestures from A-Z (26 classes)?

    • Adrian Rosebrock February 1, 2019 at 6:46 am #

      Not as it stands. VGG is a classification network and presuming you are referring to the pre-trained VGG network on ImageNet there is no hand gesture classes. You would need to fine-tune VGG to recognize hand gestures. For what it’s worth I cover tranfer learning and fine-tuning inside my book, Deep Learning for Computer Vision with Python.

  43. fishwolf February 8, 2019 at 6:59 am #

    Is it possible know where the cat/dog/panda are into image?
    is it possible running this process in real time with a video streaming?


  44. riyaz February 18, 2019 at 2:24 am #

    can you do the same problem for binary classification.. i got stuck at doing that… i have 2 classes only.. and i want to save the model also

    • Adrian Rosebrock February 20, 2019 at 12:34 pm #

      You’ll want to change your loss function to “binary_crossentropy” for 2-class, binary classification. This tutorial covers how to save and load your models with Keras. For more details on deep learning, including how to get started, I would suggest working through Practical Python and OpenCV.

  45. sruthi February 26, 2019 at 2:37 am #

    i didn’t understand how feature extraction is done in this code. I have applied the same code for gender recognition and the only difference from your code is the training set images. I would like to know how feature extraction is done and what features have been extracted.

    • Adrian Rosebrock February 27, 2019 at 5:44 am #

      If you’re interested in feature extraction via pre-trained CNNs (including gender recognition) then definitely take a look at Deep Learning for Computer Vision with Python where I cover the topic in detail.

  46. sruthi February 26, 2019 at 7:41 am #

    I have used this exact same code for gender recogniton. It is working but i would like to know what feature are being extracted as well as how feature extraction is done. Can you please reply asap.The only difference from your code is the dataset used.

  47. Nguyen Anh Duy March 16, 2019 at 10:20 pm #

    Hi Adrian,

    I only want to classify dog and cat, so I change “caterogry_crossentropy” to “binary_crossentropy, then it has error:

    “expected activation_2 to have shape (2 ) but got array with shape (1 )”

    then I change to “spare_category_crossentropy” and it works.

    But if I want to classify gray images, for examples “number 0” and “number 1” in MINST dataset, I do “binary_crossentropy” and change the input_shape to:

    “model =, height=28, depth=1, classes=len(lb.classes_))

    then it shows similar error as:

    “expected activation_2 to have shape (2 ) but got array with shape (1 )”

    Could you help me?

    Thank you very much.

    • Adrian Rosebrock March 19, 2019 at 10:13 am #

      1. See my note on Lines 66-69 about using Keras’ “to_categorical” function.

      2. You should be using “binary_crossentropy” as your loss.

      Once you switch both of those you will be able to train the network.

      • Daniel April 5, 2019 at 11:33 am #

        how do you use the “to_categorical” function in this context? I don’t have a lot of coding experience with python. I googled for examples but it did not work for me.
        Also, in previous related questions you mentioned to change to LabelEncoder (instead of LabelBinarizer). I tried my hand at that but did not work:
        I changed:
        lb = LabelBinarizer() to lb = LabelEncoder()

        I also change to the binary_crossentropy as a loss function. But I am getting an error like Nguyen mentioned above.
        Many thanks for the tutorial. It is REALLY helpful.

        • Adrian Rosebrock April 12, 2019 at 1:00 pm #

          You first encode using LabelEncoder and then call to_categorical, similar to the following:

          • bramata vikana November 20, 2019 at 6:24 am #

            hi adriand thank you for your explanation , but can you explain me what is numClasses is ? thank you so much

          • Adrian Rosebrock November 21, 2019 at 9:03 am #

            The “numClasses” is the total number of unique class labels. For example, suppose you had a three class dataset: dogs, cats, and pandas. Then “numClasses=3” since you have three total classes.

        • Beatrice van Eden May 15, 2019 at 5:38 am #

          Did you get this working? I check this out today but obviously, I do not understand exactly what is going on. I keep on getting errors even after using well-explained code on this.

          • Adrian Rosebrock May 15, 2019 at 2:30 pm #

            Hi Beatrice — did you see my previous comment? I provided you with code you could use.

  48. Robert March 20, 2019 at 5:22 am #


    This is a great tutorial.

    I’ve purchased your book and it’s supporting material and look forward to reading and learning even more about this topic.

    Keep up the good work.


    • Adrian Rosebrock March 22, 2019 at 9:41 am #

      Thanks so much, Robert! I hope you are enjoying. By all means, feel free to reach out if you have any questions on it 🙂

    • Beatrice van Eden May 20, 2019 at 4:26 am #

      Yes, I did. I made the modifications but then get an error when training the neural net needs to happen. The shape of the array is not what it is expecting any more.

  49. Alexander March 21, 2019 at 7:03 am #

    Hello and nice guide!

    I got a question, is this tutorial for windows or linux?

    • Adrian Rosebrock March 22, 2019 at 8:41 am #

      Provided you have Keras properly installed this tutorial will work on Linux, macOS, and Windows.

      • Alexander March 28, 2019 at 6:37 am #

        Thank you for the response!
        I have another question, I tried running the train_vgg script and it takes about 3-4 minutes per epoch on my computer. How do I tell Tensorflow to use my GPU instead of my CPU? I assume it uses my CPU since the timers are well over 1 minute.

        • Adrian Rosebrock April 2, 2019 at 6:31 am #

          You can use the “nvidia-smi” command to check and see if your GPU is being utilized. You’ll also want to ensure the “tensorflow-gpu” package is installed.

  50. Sky April 21, 2019 at 1:06 am #

    I learned that the evaluation dataset is used to tunning the hyperparameters.
    In this blog, what are the hyperparameters?

    • Adrian Rosebrock April 25, 2019 at 9:12 am #

      The hyperparameters include the learning rate, number of nodes/filters for each layer, and any regularization. I would definitely suggest reading through Deep Learning for Computer Vision with Python where I cover hyperparameters (and how to properly tune them) in detail.

  51. Wadson Garbes May 9, 2019 at 4:01 pm #

    Great tutorial!
    Can you give me some hints on how to deploy this on a Flask Web App ? It would be great!

  52. Beatrice van Eden May 10, 2019 at 7:50 am #

    Thank you for sharing this with us. I found it to be of great benefit for me.

    # do you have a similar tutorial for RGB-D data? I know you add the extra channel but I suppose my struggle is even before that, with the per processing of the data. I recorded a ROS bag with the RGB-D data then I can extract the RGB in a folder and the D in another, then I get confused when trying to give it as an input to the convnet. (I struggle with the coding).

    • Adrian Rosebrock May 15, 2019 at 3:12 pm #

      Sorry, I Do not have any tutorials for RGB-D data.

  53. Tuan Anh Nguyen May 19, 2019 at 4:59 am #

    Hello! thank you for sharing this with us!!!
    I still do not understand how you label dogs, cats and pandas? Please explain to me the label?

    • Adrian Rosebrock May 23, 2019 at 9:53 am #

      I manually labeled those images themselves. I created a directory for each of the dogs, cats, and pandas images, then placed each into their corresponding directory.

  54. Beatrice van Eden May 20, 2019 at 4:21 am #

    Thank you.

    • Adrian Rosebrock May 23, 2019 at 9:48 am #

      No problem, I’m glad you found it helpful!

  55. mary May 20, 2019 at 4:25 pm #

    hey this tutorial is awesome,the code for non CNN worked just fine but when i ran it for CNN with smallvggnet it gave me the error:
    Import Error: No module named ‘pyimagesearch’
    how to resolve this?
    and secondly if i use the this line : image = cv2.resize(image, (64, 64))
    will it resize all my images into 64×64 no matter what the original size? plus how do i know that the images being fed into the neural network are fine for training,wont the larger images be distorted like that?(the details are unable to be observed for training)
    My last question,for this line in smallvggnet script:inputShape = (width, height,depth )
    do i write the dimensions which i want the image in or what the image already has?(in a dataset how can i tell about 1 image it has many images!)

    • Adrian Rosebrock May 23, 2019 at 9:45 am #

      Hey Mary — make sure you use the “Downloads” section of the code to download the source code. It sounds like you may have copied and pasted which likely caused the error.

      Secondly, I would recommend you read Deep Learning for Computer Vision with Python so you can learn the fundamentals of deep learning. That book will help you understand how we preprocess images and better enable you to train your own CNNs.

  56. Chinmaya Panda July 9, 2019 at 9:20 am #

    Dear Sir,

    This is a best literature I have come across internet for ML implementation.
    It is the exact way for my assigned work.

    Today morning 9am I have started and finished all by 8pm.
    I have understood the concepts, and Implemented in Jupyter Notebook, and got the result after few changes.
    CNN based model testing is pending, but I will do this from your other blog.

    Such a nice way of explanation and detailed code needs lots of appreciation, so I am dropping this message.

    Thanks a lot for your contribution for society and Human Race.

    • Adrian Rosebrock July 10, 2019 at 9:38 am #

      Thanks Chinmaya, I really appreciate the kind words 🙂 Congrats on training your own NNs and CNNs!

  57. Ali July 10, 2019 at 3:07 am #

    Hi dear adrian!
    Can you help me to train it for two classes only.

  58. Henrique July 23, 2019 at 12:53 pm #

    Hi Adrian,

    Can i use this code to train 1 class only?

    I’m trying to identify a object in photo. If the object is there i will receive a “ok” and if it’s not i will receive a “nok”

  59. Ammu August 12, 2019 at 6:12 am #

    Hi. how to download the animals dataset?
    I couldn’t find it in the downloads section.


    • Adrian Rosebrock August 16, 2019 at 5:44 am #

      Download the .zip of the file using the “Downloads” section of the tutorial. You’ll find the “animals” dataset there.

  60. Andres August 20, 2019 at 1:33 am #

    This was a very detailed tutorial. If I wanted to use Tensorflow 2.0 with the new keras interface, would I need to simply do something like: “import tensorflow.keras as keras” and the rest would work the same?


    • Adrian Rosebrock August 20, 2019 at 10:02 am #

      You are absolutely correct! Since TensorFlow 2.0 is making big moves to use the “tensorflow.keras” package you can just import all Keras classes/functions directly from “tensorflow.keras”.

  61. Abdullah October 13, 2019 at 3:29 am #

    Hi Adrian, if I am adding another class “cow” for example, isn’t it necessary to change the epochs number?

    • Adrian Rosebrock October 17, 2019 at 8:00 am #

      Not necessarily. The epochs doesn’t impact the number of classes or vice versa. Try training using the same number of epochs. Additionally you should read Deep Learning for Computer Vision with Python to learn my best practices, tips, and suggestions when training your own deep learning models.

  62. Agnes October 17, 2019 at 3:11 am #

    Hi Adrian,

    I would like to know if there is an explanation for fixing the number of neurons in the first hidden layer as 1024 from the input shape as 3072 in Line 76 in file. I understand in every hidden layer due to dimensionality reduction, the image size gets reduced to one half of the original image size. Hence from Hidden layer 1 -> Hidden Layer 2 the pixels get reduced to 512 from 1024. But How does it change from 3072 to 1024?

    Thanks in Advance…..

  63. Srinivas and Mangipudi October 28, 2019 at 12:13 pm #

    Hi I got an error after the training and network evaluation finished. The error was in generating the plot:

    Traceback (most recent call last):
    File “”, line 111, in
    plt.plot(N, H.history[“acc”], label=”train_acc”)
    KeyError: ‘acc’

    • Srinivas and Mangipudi October 28, 2019 at 1:49 pm #

      Hi, I managed to get rid of the error by using metrics=[“acc”] in model.compile.

      But after running training set, i notice that the accuracy is below 50%, that means it is performing worse than random chance. Infact I gave it a cat image to predict but it predicted it was a dog with 63% accuracy.

      I don’t understand why its doing this?

    • Adrian Rosebrock November 7, 2019 at 10:37 am #

      In TensorFlow 2.0 the “acc” key was changed to “accuracy” and “val_accuracy”, respectively.

  64. Arif November 11, 2019 at 11:46 pm #

    Hi Adrian,

    If I would like to implement face recognition application based on your codes, What should I do except adding the face detection?

    Thank you

  65. Aditi December 10, 2019 at 6:47 am #

    Hi Adrain

    Thanks for you post. Since i am using keras 2.3.1 and tensorflow 2.0.0. I read the previous comments and i changed the “acc” to accuracy and i got my plot as png. But still the other two model file and the pickle file is not loaded in the output folder. And also the pickle file is not loaded.


    • Adrian Rosebrock December 12, 2019 at 10:07 am #

      Make sure you train your model first. Once the model is trained you can then make predictions using it.

  66. Anja December 23, 2019 at 9:32 am #

    Hi Adrian,

    I created my own model with and it works great. 🙂

    With I can check individual images. However, I would like to check a live video with the created model.
    That’s why I changed the code so that it checks the frames of the webcam – Alternatively, video files.
    Unfortunately, the recognition (labeling) does not work well here, although I use the same model as the one
    Checking individual images.

    I extracted individual pictures from a video file and I check them with
    Result: Everything is recognized correctly.
    If I now check the video file with, nothing is correctly recognized.

    Is it because you cannot use the model for live or video file recognition?
    Do I have to train the model differently?

    Thanks a lot!

    • Adrian Rosebrock December 26, 2019 at 9:55 am #

      It’s hard to say what the issue is without seeing your code or video, but I would suggest you start with this tutorial to help you learn how to apply a Keras model to a video stream.

      Secondly, double and triple-check that your preprocessing steps are the same for inference/prediction as they are for training. A common mistake I see beginners make is forgetting to preprocess their images in the same manner as training.

  67. teimoor January 4, 2020 at 2:17 am #

    Hi how to insert my trained model since i don’t have any trained model in my disk?. it is required argument as per your code

    • Adrian Rosebrock January 16, 2020 at 10:59 am #

      You need to train your model before you serialize it to disk. From there you can use it to classify new input images.

  68. Tharumudu January 6, 2020 at 12:08 am #

    Hi Adrian,

    This is a great tutorial and made everything easy for me as always. I would like to know a robust way to predict when I have like around 50,000 images. I’m currently looping through the images with ” tensorflow.keras.backend.clear_session() ” line after the prediction line.

    Is there any way of predict all the images at once and then loop through them and save?

    • Adrian Rosebrock January 16, 2020 at 10:52 am #

      You mean make predictions on all 50,000 images? Yes, absolutely, just use Keras’ predict_generator function.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply