# A simple neural network with Python and Keras

If you’ve been following along with this series of blog posts, then you already know what a huge fan I am of Keras.

Keras is a super powerful, easy to use Python library for building neural networks and deep learning networks.

In the remainder of this blog post, I’ll demonstrate how to build a simple neural network using Python and Keras, and then apply it to the task of image classification.

Looking for the source code to this post?

## A simple neural network with Python and Keras

To start this post, we’ll quickly review the most common neural network architecture — feedforward networks.

We’ll then write some Python code to define our feedforward neural network and specifically apply it to the Kaggle Dogs vs. Cats classification challenge. The goal of this challenge is to correctly classify whether a given image contains a dog or a cat.

Finally, we’ll review the results of our simple neural network architecture and discuss methods to improve it.

### Feedforward neural networks

While there are many, many different neural network architectures, the most common architecture is the feedforward network:

Figure 1: An example of a feedforward neural network with 3 input nodes, a hidden layer with 2 nodes, a second hidden layer with 3 nodes, and a final output layer with 2 nodes.

In this type of architecture, a connection between two nodes is only permitted from nodes in layer i to nodes in layer i + 1 (hence the term feedforward; there are no backwards or inter-layer connections allowed).

Furthermore, the nodes in layer i are fully connected to the nodes in layer i + 1. This implies that every node in layer i connects to every node in layer i + 1. For example, in the figure above, there are a total of 2 x 3 = 6 connections between layer 0 and layer 1 — this is where the term “fully connected” or “FC” for short, comes from.

We normally use a sequence of integers to quickly and concisely describe the number of nodes in each layer.

For example, the network above is a 3-2-3-2 feedforward neural network:

• Layer 0 contains 3 inputs, our $x_{i}$ values. These could be raw pixel intensities or entries from a feature vector.
• Layers 1 and 2 are hidden layers, containing 2 and 3 nodes, respectively.
• Layer 3 is the output layer or the visible layer — this is where we obtain the overall output classification from our network. The output layer normally has as many nodes as class labels; one node for each potential output. In our Kaggle Dogs vs. Cats example, we have two output nodes — one for “dog” and another for “cat”.

### Implementing our own neural network with Python and Keras

Now that we understand the basics of feedforward neural networks, let’s implement one for image classification using Python and Keras.

To start, you’ll want to follow this tutorial to ensure you have Keras and the associated prerequisites installed on your system.

From there, open up a new file, name it simple_neural_network.py , and we’ll get coding:

We start off by importing our required Python packages. We’ll be using a number of scikit-learn implementations along with Keras layers and activation functions. If you do not already have your development environment configured for Keras, please see this blog post.

We’ll be also using imutils, my personal library of OpenCV convenience functions. If you do not already have imutils  installed on your system, you can install it via pip :

Next, let’s define a method to accept and image and describe it. In previous tutorials, we’ve extracted color histograms from images and used these distributions to characterize the contents of an image.

This time, let’s use the raw pixel intensities instead. To accomplish this, we define the image_to_feature_vector  function which accepts an input image  and resizes it to a fixed size , ignoring the aspect ratio:

We resize our image  to fixed spatial dimensions to ensure each and every image in the input dataset has the same “feature vector” size. This is a requirement when utilizing our neural network — each image must be represented by a vector.

In this case, we resize our image to 32 x 32 pixels and then flatten the 32 x 32 x 3 image (where we have three channels, one for each Red, Green, and Blue channel, respectively) into a 3,072-d feature vector.

The next code block handles parsing our command line arguments and taking care of a few initializations:

We only need a single switch here, --dataset , which is the path to the input directory containing the Kaggle Dogs vs. Cats images. This dataset can be downloaded from the official Kaggle Dogs vs. Cats competition page.

Line 28 grabs the paths to our --dataset  of images residing on disk. We then initialize the data  and labels  lists, respectively, on Lines 31 and 32.

Now that we have our imagePaths , we can loop over them individually, load them from disk, convert the images to feature vectors, and the update the data  and labels  lists:

The data  list now contains the flattened 32 x 32 x 3 = 3,072-d representations of every image in our dataset. However, before we can train our neural network, we first need to perform a bit of preprocessing:

Lines 59 and 60 handle scaling the input data to the range [0, 1], followed by converting the labels  from a set of integers to a set of vectors (a requirement for the cross-entropy loss function we will apply when training our neural network).

We then construct our training and testing splits on Lines 65 and 66, using 75% of the data for training and the remaining 25% for testing.

For a more detailed review of the data preprocessing stage, please see this blog post.

We are now ready to define our neural network using Keras:

On Lines 69-74 we construct our neural network architecture — a 3072-768-384-2 feedforward neural network.

Our input layer has 3,072 nodes, one for each of the 32 x 32 x 3 = 3,072 raw pixel intensities in our flattened input images.

We then have two hidden layers, each with 768 and 384 nodes, respectively. These node counts were determined via a cross-validation and hyperparameter tuning experiment performed offline.

The output layer has 2 nodes — one for each of the “dog” and “cat” labels.

We then apply a softmax  activation function on top of the network — this will give us our actual output class label probabilities.

The next step is to train our model using Stochastic Gradient Descent (SGD):

To train our model, we’ll set the learning rate parameter of SGD to 0.01. We’ll use the binary_crossentropy  loss function for the network as well.

In most cases, you’ll want to use just crossentropy , but since there are only two class labels, we use binary_crossentropy . For > 2 class labels, make sure you use crossentropy .

The network is then allowed to train for a total of 50 epochs, meaning that the model “sees” each individual training example 50 times in an attempt to learn an underlying pattern.

The final code block evaluates our Keras neural network on the testing data:

### Classifying images using neural networks with Python and Keras

To execute our simple_neural_network.py  script, make sure you have:

2. Downloaded the Kaggle Dogs vs. Cats dataset from the Kaggle competition page.

The following command can be used to train our neural network using Python and Keras:

Note: You might need to rename your Kaggle dataset directory (or simply update the path supplied to --dataset ) before executing the command above.

The output of our script can be seen in the screenshot below:

Figure 2: Training a simple neural network using the Keras deep learning library and the Python programming language.

On my Titan X GPU, the entire process of feature extraction, training the neural network, and evaluation took a total of 1m 15s with each epoch taking less than 0 seconds to complete.

At the end of the 50th epoch, we see that we are getting ~76% accuracy on the training data and 67% accuracy on the testing data.

This ~9% difference in accuracy implies that our network is overfitting a bit; however, it is very common to see ~10% gaps in training versus testing accuracy, especially if you have limited training data.

You should start to become very worried regarding overfitting when your training accuracy reaches 90%+ and your testing accuracy is substantially lower than that.

In either case, this 67.376% is the highest accuracy we’ve obtained thus far in this series of tutorials. As we’ll find out later on, we can easily obtain > 95% accuracy by utilizing Convolutional Neural Networks.

## Summary

In today’s blog post, I demonstrated how to train a simple neural network using Python and Keras.

We then applied our neural network to the Kaggle Dogs vs. Cats dataset and obtained 67.376% accuracy utilizing only the raw pixel intensities of the images.

Starting next week, I’ll begin discussing optimization methods such as gradient descent and Stochastic Gradient Descent (SGD). I’ll also include a tutorial on backpropagation to help you understand the inner-workings of this important algorithm.

Before you go, be sure to enter your email address in the form below to be notified when future blog posts are published — you won’t want to miss them!

### 56 Responses to A simple neural network with Python and Keras

1. Stan September 26, 2016 at 9:48 pm #

That is awesome. Thanks. Please keep posting that stuff.

• Adrian Rosebrock September 27, 2016 at 6:36 am #

Thanks Stan! I’ll certainly be doing more neural network and deep learning tutorials in the future.

2. Bogomil September 27, 2016 at 4:01 pm #

I used the default engine for keras – tensorflow and got the following:

Epoch 50/50
18750/18750 [==============================] – 12s – loss: 0.4859 – acc: 0.7707
[INFO] evaluating on testing set…
6250/6250 [==============================] – 1s
[INFO] loss=0.6020, accuracy: 68.0960%

is this difference normal ?

• Adrian Rosebrock September 28, 2016 at 10:42 am #

Absolutely. Keep in mind that neural networks are stochastic algorithms meaning there is a level of randomness involved with them (specifically the weight initializations). It’s totally normal to see a bit of variance between training runs.

3. Gilad September 28, 2016 at 1:06 am #

Hi,
wonderful post!
I have a question – how did you manage to pick your parameters (including the NN scheme)?
No matter what I did (and I did a lot – including adding 2 more NN levels, adding dropout, changeling the SGD parameters and all other parameters), I didn’t manage to get more than your 67%.
Especially I wonder why adding more levels and increasing the depth of each, didn’t contribute to my score (but as expected contribute to my run time ;-))
Only when I increased the resolution to 64×64, and the depth of the 2 NN levels, I manage to get 68%, and I wonder why it is so low.

• Adrian Rosebrock September 28, 2016 at 10:37 am #

Hey Gilad — as the blog post states, I determined the parameters to the network using hyperparameter tuning.

Regarding the accuracy, keep in mind that this is a simple feedforward neural network. 68% accuracy is actually quite good for only considering the raw pixel intensities. And again, as the blog post states, we require a more powerful network architecture (i.e., Convolutional Neural Networks) to obtain higher accuracy. I’ll be covering how to apply CNNs to the Dogs vs. Cats dataset in a future blog post. In the meantime, I would suggest reading this blog post on MNIST + LeNet to help you get started with CNNs.

4. Max Kostka September 28, 2016 at 2:10 pm #

Yes, absolutely awesome Adrian, i am already totally eager for a simple convolutional neural network. I love your blog 🙂 Been following it for a year now. Keep up the great work.
Btw, i did this simple neural network on a raspberry Pi 2 and FYI it took almost 5 hours 😀

• Adrian Rosebrock September 30, 2016 at 6:51 am #

Thanks for the kind words Max, I’m happy the tutorial helped you (and that you’ve been a long time reader)!

If you would like a simple CNN, take a look at this blog post on LeNet to help you get started. Future posts will discuss each of the layer types in detail, etc.

• Max Kostka September 30, 2016 at 1:29 pm #

i did that right away, another awesome post:D and fyi, the training there on a raspi 2 took almost about 19 hours.

5. Marios September 29, 2016 at 11:09 pm #

You could also do an implementation of your NN using TensorFlow!

• Adrian Rosebrock September 30, 2016 at 6:40 am #

Keras can use either Theano or TensorFlow as a backend — it’s really your choice. I personally like using Keras because it adds a layer of abstraction over what would otherwise be a lot more code to accomplish the same task. In future blog posts I’m planning on continuing using Keras, but I’ll also consider the “nitty-gritty” with TensorFlow as well!

6. roberto October 1, 2016 at 7:52 am #

I run the code, but i would like to use it to classify some images, but i dont want to run it every time. How can i save the model and use it to classify?

ps: I’ll be waiting for next post to improve the accuracy!

Regards!!!

• Adrian Rosebrock October 2, 2016 at 9:02 am #

Once your model is saved you can actually serialize it to disk using `model.save` and then load it again via `load_model`. Take a look at the Keras documentation for more information and a code example.

7. Atti November 29, 2016 at 10:53 am #

great article thanks for all the insights

8. Alberto Franzaroli December 1, 2016 at 6:06 am #

Now there is also a opensource library The Microsoft Cognitive Toolkit
would you like to try it and compare with Keras ?

• Adrian Rosebrock December 1, 2016 at 7:20 am #

I haven’t used the Microsoft Cognitive Toolkit before, but I’ll look into it. I don’t normally use Microsoft products.

9. Dharma KC December 11, 2016 at 9:06 am #

Please can you provide the link to the tutorial with convolutional neural network to solve this problem with 95% accuracy. Thank you.

• Adrian Rosebrock December 11, 2016 at 10:46 am #

I will be covering how to obtain 95%+ accuracy in the Dogs vs. Cats challenge in my upcoming deep learning book. Stay tuned!

10. UDAY December 12, 2016 at 7:25 am #

How much it will take to train without a GPU
and how we can get a GPU for trail.

11. Tajj kasem December 14, 2016 at 4:57 pm #

How I can use model.predict() after training my neural network .
I have this error :

Exception: Error when checking : expected dense_input_1 to have 2 dimensions, but got array with shape (303, 400, 3)
How i fixed it ?

• Adrian Rosebrock December 18, 2016 at 9:10 am #

You need to call `image_to_feature_vector` on your image before passing it into `model.predict`.

12. azhng December 26, 2016 at 10:23 pm #

Thank you so much for this awesome tutorial. However, when I run the code on my laptop, the process with terminated with exit code of 137.
Any idea what does that mean?

13. Yunhwan Kim January 10, 2017 at 11:06 pm #

Thank you for awesome tutorial.
I just wonder how you could use Titan X GPU on your (seemingly) OSX machine. I see “ssh” in the top of the terminal window figure, and I guess that you access other (probably linux) machine with GPU from your OSX machine via ssh.
Then, do you have any plan to post about that process? It would be much helpful if I (and other readers) could use GPU in other machine from OSX machine.
Thank you again.

Yunhwan

• Adrian Rosebrock January 11, 2017 at 10:35 am #

You are correct, Yunhwan — I am ssh’ing into my Ubuntu GPU box and then running any scripts over the SSH session. Does that help clarify your question? If you are looking to learn more about SSH and how to SSH into machines I would suggest reading up on SSH basics.

14. Vincent FOUCAULT January 15, 2017 at 1:48 pm #

didn’t you forget, in picture1, connection from first node in layer2 to second in layer3 ?

I’m impatient to see your next books.

CU.
Vincent

15. Foobar March 10, 2017 at 11:07 pm #

Hi i am training an an ARM based device 4 cores 1GB RAM but i am getting a memory error when running the script it gets up to processing 24,000 images and crashes on a memory error but there is still 100MB of free space what am I doing wrong and how do I fix this?

• Adrian Rosebrock March 13, 2017 at 12:20 pm #

It’s hard to say without knowing which device you are using. I would confirm that the script will run on your desktop/laptop before moving on to other devices.

• Foobar March 17, 2017 at 7:21 pm #

It works on my laptop but I have been trying to run it on an Odroid C1. My Odroid is running a headless debian jessie for the odroid.

• Adrian Rosebrock March 21, 2017 at 7:40 am #

I haven’t used Odroid before, so I’m not sure about the specifics. Debian Jessie seems like it would work just fine; however, I don’t have any experience with the Odroid so I’m not sure what the exact problem would be. Again, we typically don’t train networks on such small devices — only deploy them if memory allows.

16. DL March 25, 2017 at 11:51 am #

Getting this error while using Keras 1.0.7 in Anaconda

Any idea?

Thanks!

• Adrian Rosebrock March 28, 2017 at 1:12 pm #

Your class labels are not getting encoding properly. 99% of the time this is due to invalid paths to your training images. Double-check the path to the training images and ensure it’s correct. Also make sure you are not also using the paths to the Kaggle testing data as these filenames do not have “dog” or “cat” in them.

17. naitik March 26, 2017 at 6:59 am #

It’s really informative article you posted , but just curious that instead of having accuracy can i have detailed result of each image classified as wrong or right ?

• Adrian Rosebrock March 28, 2017 at 1:07 pm #

Can you clarify what you mean by “detailed result of each image classified as wrong or right”? I’m not sure what you mean.

18. Werner April 19, 2017 at 1:31 pm #

Nice post really love the work! I just have a question regarding the feedforward idea. From my understanding, feedforward network uses delta rule to learn, and does not backpropagate. How is this specified using Keras? If you were to write this feedforward Keras code using backpropagation, how would it be different?

Thanks!

• Adrian Rosebrock April 21, 2017 at 11:03 am #

I think you might be confusing the standard Perceptron algorithm with multi-layer feedforward networks. The Perceptron uses the delta rule to learn while multi-layer feedforward networks use backpropagation. If you’re interested in learning more about these algorithms, how to train neural networks, and even build Convolutional Neural Networks that can understand the contents of an image, be sure to take a look at Deep Learning for Computer Vision with Python.

19. maxtri April 20, 2017 at 4:53 am #

20. chris May 11, 2017 at 2:05 am #

Simple question…got it to work but lets say I want to load in the test data (the pictures with just numbers) any trick to doing that?

• Adrian Rosebrock May 11, 2017 at 8:44 am #

Hi Chris — can you elaborate more on what you mean by “the pictures with just numbers”? I’m not sure I understand.

• chris May 11, 2017 at 11:58 am #

Sure 🙂 so in the data set of dogs and cats there is the training data that is labeled either a cat or a dog and its corresponding image number. Using this data we train and test our model (correct me if i’m wrong anywhere). Once this is done, model is trained and tested for accuracy, we could use it to predict if an image is a cat or a dog. So at kaggles site there is a set of images that you can download that is a mix of cats and dogs but minus the label of a cat or a dog. Its simply just numbered images. So how do we take that data and feed it into our model to predict if those images are cats or dogs. I want to use the model now to do actual predictions. Thanks for the prompt response.

• Adrian Rosebrock May 15, 2017 at 8:57 am #

Thank you for the clarification on the question. Basically, you would like to know how take a test set and then pass it through the network to obtain the output classifications.

All you need to do is (1) pre-process your input data in the same way as your training data and (2) call the `.predict` method of the network.

I would suggest taking a look at this blog post on LeNet where I demonstrate how to classify individual images.

21. Richard May 19, 2017 at 7:00 am #

Hi, fantastic posts – I love your blog.

I have a question, though!

In ‘SGD(lr=0.01)’, where does the 0.01 come from? You don’t say how you chose that value. Was it pie-in-the-sky, or was there some secret to your choice?

Thanks!

• Adrian Rosebrock May 21, 2017 at 5:19 am #

Hi Richard — typical initial learning rates include 0.1, 0.01, and 0.001. It really depends on the specific architecture and your dataset, but those are typical starting points.

22. Will May 25, 2017 at 5:47 pm #

I am having trouble getting the SGD algorithm to converge. The algorithm generally does well and decreases the loss, but sometimes (generally after a few epochs) the loss explodes in a few steps (by a factor of 10 or so) and does not recover. It does not simply seem to be fluctuations from navigating local minima of the objective function, it seems that there is something pathological going on. Which is bizzare because I am using the same code and hyperparameters.

• Adrian Rosebrock May 28, 2017 at 1:17 am #

It sounds like you’re overfitting. Are you using the dataset in this blog post or your own custom dataset?

23. Shrinidhi Rao June 7, 2017 at 12:30 am #

Hi, Thanks for the tutorial. but for some reason I am not able to enter the for loop, which is giving me the error

• Adrian Rosebrock June 9, 2017 at 1:53 pm #

What is the error you are getting, Shrinidhi?

24. Foobar June 20, 2017 at 6:41 am #

I have managed to use the .predict function with this but I don’t know how to understand the data given by .predict

• Adrian Rosebrock June 20, 2017 at 10:43 am #

The `.predict` method will return a NumPy with shape `(N, M)` where N is the total number of data points passed into `.predict` and M is your total number of class labels. You can use the `np.argmax` function to find the index with the largest class label probability for each row.

• Foobar June 20, 2017 at 4:49 pm #

Thanks

25. Rolando June 29, 2017 at 2:11 pm #

One question, I want to make a neural network probabilistica in python, you that you recommend me?

• Adrian Rosebrock June 30, 2017 at 8:08 am #

Do Do you mean a neural network that predicts probabilities? This implementation can do that as well. Just use the `model.predict_proba` function.

• Rolando June 30, 2017 at 2:50 pm #

Thanks, I mean a PNN type neural network

26. Niki July 14, 2017 at 2:57 pm #

Nice work! I used your code with the exact same data, but I could never reach an accuracy better than 50% on both training and test data. I tried to increase the resolution of the images, but it didn’t work. No matter what I tried I kept seeing the same pattern in the results; at the first few epochs the accuracy improved and the loss decreased, but then all of the sudden the loss became 8 or 9 times large. I thought this might be the result of overfitting, so finally I reduced both learning rate as well as the number of epochs. The best result that I got was 65.7% on training and 64.59% on the test data and this was achieved by setting learning rate = .005 and number of epochs = 25. I understand that using raw pixels as the input features does not make a strong feature vector, so I shouldn’t expect a high accuracy, but I was wondering why using exact same data and exact same code could result in such a big difference in the final result (in particular, my first results that couldn’t get any better than 50%, and this happened when I ran the code at least 7 or 8 times before making any changes)?

• Adrian Rosebrock July 18, 2017 at 10:13 am #

It depends on (1) what version of Keras you are using and (2) whether you are using Theano or TensorFlow as your backend. Both of these can have different impacts on your accuracy. However, without physical access to your machine I can’t be 100% sure what the issue is.

• Niki July 19, 2017 at 10:15 am #

Thank you for your response! These are the python, Keras, and Tensorflow (my keras backend engine) versions that I use:

Python 2.7.13 (anaconda 1.6.3), Tensorflow 1.2.1, Keras 2.0.5

• Adrian Rosebrock July 21, 2017 at 8:57 am #

Hi Niki — thank you for sharing your Python and library versions. Unfortunately, I’m not sure what the exact issue is here. I wish I could be of more help, but like I said, without physical access to your machine, I can’t diagnose why there are such big discrepancies between the accuracies.