Fashion MNIST with Keras and Deep Learning

In this tutorial you will learn how to train a simple Convolutional Neural Network (CNN) with Keras on the Fashion MNIST dataset, enabling you to classify fashion images and categories.

The Fashion MNIST dataset is meant to be a (slightly more challenging) drop-in replacement for the (less challenging) MNIST dataset.

Similar to the MNIST digit dataset, the Fashion MNIST dataset includes:

  • 60,000 training examples
  • 10,000 testing examples
  • 10 classes
  • 28×28 grayscale/single channel images

The ten fashion class labels include:

  1. T-shirt/top
  2. Trouser/pants
  3. Pullover shirt
  4. Dress
  5. Coat
  6. Sandal
  7. Shirt
  8. Sneaker
  9. Bag
  10. Ankle boot

Throughout this tutorial, you will learn how to train a simple Convolutional Neural Network (CNN) with Keras on the Fashion MNIST dataset, giving you not only hands-on experience working with the Keras library but also your first taste of clothing/fashion classification.

To learn how to train a Keras CNN on the Fashion MNIST dataset, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Fashion MNIST with Keras and Deep Learning

In the first part of this tutorial, we will review the Fashion MNIST dataset, including how to download it to your system.

From there we’ll define a simple CNN network using the Keras deep learning library.

Finally, we’ll train our CNN model on the Fashion MNIST dataset, evaluate it, and review the results.

Let’s go ahead and get started!

The Fashion MNIST dataset

Figure 1: The Fashion MNIST dataset was created by e-commerce company, Zalando, as a drop-in replacement for MNIST Digits. It is a great dataset to practice with when using Keras for deep learning. (image source)

The Fashion MNIST dataset was created by e-commerce company, Zalando.

As they note on their official GitHub repo for the Fashion MNIST dataset, there are a few problems with the standard MNIST digit recognition dataset:

  1. It’s far too easy for standard machine learning algorithms to obtain 97%+ accuracy.
  2. It’s even easier for deep learning models to achieve 99%+ accuracy.
  3. The dataset is overused.
  4. MNIST cannot represent modern computer vision tasks.

Zalando, therefore, created the Fashion MNIST dataset as a drop-in replacement for MNIST.

The Fashion MNIST dataset is identical to the MNIST dataset in terms of training set size, testing set size, number of class labels, and image dimensions:

  • 60,000 training examples
  • 10,000 testing examples
  • 10 classes
  • 28×28 grayscale images

If you’ve ever trained a network on the MNIST digit dataset then you can essentially change one or two lines of code and train the same network on the Fashion MNIST dataset!

How to install Keras

If you’re reading this tutorial, I’ll be assuming you have Keras installed. If not, be sure to follow Installing Keras for deep learning.

You’ll also need OpenCV and imutils installed. Pip is suitable and you can follow my pip install opencv tutorial to get started.

The last tools you’ll need are scikit-learn and matplotlib:

Obtaining the Fashion MNIST dataset

Figure 2: The Fashion MNIST dataset is built right into Keras. Alternatively, you can download it from GitHub. (image source)

There are two ways to obtain the Fashion MNIST dataset.

If you are using the Keras deep learning library, the Fashion MNIST dataset is actually built directly into the datasets module of Keras:

Otherwise, if you are using another deep learning library you can download it directory from the the official Fashion MNIST GitHub repo.

A big thanks to Margaret Maynard-Reid for putting together the awesome illustration in Figure 2.

Project structure

To follow along, be sure to grab the “Downloads” for today’s blog post.

Once you’ve unzipped the files, your directory structure will look like this:

Our project today is rather straightforward — we’re reviewing two Python files:

  • pyimagesearch/ : Contains a simple CNN based on VGGNet.
  • : Our training script for Fashion MNIST classification with Keras and deep learning. This script will load the data (remember, it is built into Keras), and train our MiniVGGNet model. A classification report and montage will be generated upon training completion.

Defining a simple Convolutional Neural Network (CNN)

Today we’ll be defining a very simple Convolutional Neural Network to train on the Fashion MNIST dataset.

We’ll call this CNN “MiniVGGNet” since:

  • The model is inspired by its bigger brother, VGGNet
  • The model has VGGNet characteristics, including:
    • Only using 3×3 CONV filters
    • Stacking multiple CONV layers before applying a max-pooling operation

We’ve used the MiniVGGNet model before a handful of times on the PyImageSearch blog but we’ll briefly review it here today as a matter of completeness.

Open up a new file, name it, and insert the following code:

Our Keras imports are listed on Lines 2-10. Our Convolutional Neural Network model is relatively simple, but we will be taking advantage of batch normalization and dropout which are two methods I nearly always recommend. For further reading please take a look at Deep Learning for Computer Vision with Python.

Our MiniVGGNet  class and its  build  method are defined on Lines 12-14. The build  function accepts four parameters:

  • width : Image width in pixels.
  • height : Image height in pixels.
  • depth : Number of channels. Typically for color this value is  3  and for grayscale it is 1  (the Fashion MNIST dataset is grayscale).
  • classes : The number of types of fashion articles we can recognize. The number of classes affects the final fully-connected output layer. For the Fashion MNIST dataset there are a total of 10  classes.

Our model  is initialized on Line 17 using the Sequential  API.

From there, our inputShape  is defined (Line 18). We’re going to use "channels_last"  ordering since our backend is TensorFlow, but in case you’re using a different backend, Lines 23-25 will accommodate.

Now let’s add our layers to the CNN:

Our model  has two sets of (CONV => RELU => BN) * 2 => POOL  layers (Lines 28-46). These layer sets also include batch normalization and dropout.

Convolutional layers, including their parameters, are described in detail in this previous post.

Pooling layers help to progressively reduce the spatial dimensions of the input volume.

Batch normalization, as the name suggests, seeks to normalize the activations of a given input volume before passing it into the next layer. It has been shown to be effective at reducing the number of epochs required to train a CNN at the expense of an increase in per-epoch time.

Dropout is a form of regularization that aims to prevent overfitting. Random connections are dropped to ensure that no single node in the network is responsible for activating when presented with a given pattern.

What follows is a fully-connected layer and softmax classifier (Lines 49-57). The softmax classifier is used to obtain output classification probabilities.

The model  is then returned on Line 60.

For further reading about building models with Keras, please refer to my Keras Tutorial and Deep Learning for Computer Vision with Python.

Implementing the Fashion MNIST training script with Keras

Now that MiniVGGNet is implemented we can move on to the driver script which:

  1. Loads the Fashion MNIST dataset.
  2. Trains MiniVGGNet on Fashion MNIST + generates a training history plot.
  3. Evaluates the resulting model and outputs a classification report.
  4. Creates a montage visualization allowing us to see our results visually.

Create a new file named, open it up, and insert the following code:

We begin by importing necessary packages, modules, and functions on Lines 2-15:

  • The "Agg"  backend is used for Matplotlib so that we can save our training plot to disk (Line 3).
  • Our MiniVGGNet  CNN (defined in  in the previous section) is imported on Line 6.
  • We’ll use scikit-learn’s classification_report  to print final classification statistics/accuracies (Line 7).
  • Our Keras imports, including our fashion_mnist  dataset, are grabbed on Lines 8-11.
  • The build_montages  function from imutils will be used for visualization (Line 12).
  • Finally, matplotlib , numpy  and OpenCV ( cv2 ) are also imported (Lines 13-15).

Three hyperparameters are set on Lines 19-21, including our:

  1. Learning rate
  2. Batch size
  3. Number of epochs we’ll train for

Let’s go ahead and load the Fashion MNIST dataset and reshape it if necessary:

The Fashion MNIST dataset we’re using is loaded from disk on Line 26. If this is the first time you’ve used the Fashion MNIST dataset then Keras will automatically download and cache Fashion MNIST for you.

Additionally, Fashion MNIST is already organized into training/testing splits, so today we aren’t using scikit-learn’s train_test_split  function that you’d normally see here.

From there we go ahead and re-order our data based on "channels_first"  or "channels_last"  image data formats (Lines 31-39). The ordering largely depends upon your backend. I’m using TensorFlow as the backend to Keras, which I presume you are using as well.

Let’s go ahead and preprocess + prepare our data:

Here our pixel intensities are scaled to the range [0, 1] (Lines 42 and 43). We then one-hot encode the labels (Lines 46 and 47).

Here is an example of one-hot encoding based on the labelNames  on Lines 50 and 51:

  • “T-shirt/top”: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  • “bag”: [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]

Let’s go ahead and fit our model :

On Lines 55-58 our model  is initialized and compiled with the Stochastic Gradient Descent ( SGD ) optimizer and learning rate decay.

From there the model  is trained via the call to  on Lines 62-64.

After training for NUM_EPOCHS , we’ll go ahead and evaluate our network + generate a training plot:

To evaluate our network, we’ve made predictions on the testing set (Line 67) and then printed a classification_report  in our terminal (Lines 71 and 72).

Training history is plotted and output to disk (Lines 75-86).

As if what we’ve done so far hasn’t been fun enough, we’re now going to visualize our results!

To do so, we:

  • Sample a set of the testing images via random sampling , looping over them individually (Line 92).
  • Make a prediction on each of the random  testing images and determine the  label  name (Lines 94-96).
  • Based on channel ordering, grab the image  itself (Lines 100-105).

Now let’s add a colored label to each image and arrange them in a montage:

Here we:

  • Initialize our label   color  as green for “correct” and red for “incorrect” classification (Lines 108-112).
  • Create a 3-channel image by merging the grayscale image  three times (Line 117).
  • Enlarge the image  (Line 118) and draw a label  on it (Lines 119-120).
  • Add each image  to the images  list (Line 123)

Once the images  have all been annotated via the steps in the for  loop, our OpenCV montage is built via Line 126.

Finally, the visualization is displayed until a keypress is detected (Lines 129 and 130).

Fashion MNIST results

We are now ready to train our Keras CNN on the Fashion MNIST dataset!

Make sure you have used the “Downloads” section of this blog post to download the source code and project structure.

From there, open up a terminal, navigate to where you downloaded the code, and execute the following command:

Figure 3: Our Keras + deep learning Fashion MNIST training plot contains the accuracy/loss curves for training and validation.

Here you can see that our network obtained 94% accuracy on the testing set.

The model classified the “trouser” class 100% correctly but seemed to struggle quite a bit with the “shirt” class (~81% accurate).

According to our plot in Figure 3, there appears to be very little overfitting.

A deeper architecture with data augmentation would likely lead to higher accuracy.

Below I have included a sample of fashion classifications:

Figure 4: The results of training a Keras deep learning model (based on VGGNet, but smaller in size/complexity) using the Fashion MNIST dataset.

As you can see our network is performing quite well at fashion recognition.

Will this model work for fashion images outside the Fashion MNIST dataset?

Figure 5: In a previous tutorial I’ve shared a separate fashion-related tutorial about using Keras for multi-output deep learning classification — be sure to give it a look if you want to build a more robust fashion recognition model.

At this point, you are properly wondering if the model we just trained on the Fashion MNIST dataset would be directly applicable to images outside the Fashion MNIST dataset?

The short answer is “No, unfortunately not.”

The longer answer requires a bit of explanation.

To start, keep in mind that the Fashion MNIST dataset is meant to be a drop-in replacement for the MNIST dataset, implying that our images have already been processed.

Each image has been:

  • Converted to grayscale.
  • Segmented, such that all background pixels are black and all foreground pixels are some gray, non-black pixel intensity.
  • Resized to 28×28 pixels.

For real-world fashion and clothing images, you would have to preprocess your data in the same manner as the Fashion MNIST dataset.

And furthermore, even if you could preprocess your dataset in the exact same manner, the model still might not be transferable to real-world images.

Instead, you should train a CNN on example images that will mimic the images the CNN “sees” when deployed to a real-world situation.

To do that you will likely need to utilize multi-label classification and multi-output networks.

For more details on both of these techniques be sure to refer to the following tutorials:

  1. Multi-label classification with Keras
  2. Keras: Multiple outputs and multiple losses


In this tutorial, you learned how to train a simple CNN on the Fashion MNIST dataset using Keras.
The Fashion MNIST dataset is meant to be a drop-in replacement for the standard MNIST digit recognition dataset, including:

  • 60,000 training examples
  • 10,000 testing examples
  • 10 classes
  • 28×28 grayscale images

While the Fashion MNIST dataset is slightly more challenging than the MNIST digit recognition dataset, unfortunately, it cannot be used directly in real-world fashion classification tasks, unless you preprocess your images in the exact same manner as Fashion MNIST (segmentation, thresholding, grayscale conversion, resizing, etc.).

In most real-world fashion applications mimicking the Fashion MNIST pre-processing steps will be near impossible.

You can and should use Fashion MNIST as a drop-in replacement for the MNIST digit dataset; however, if you are interested in actually recognizing fashion items in real-world images you should refer to the following two tutorials:

  1. Multi-label classification with Keras
  2. Keras: Multiple outputs and multiple losses

Both of the tutorials linked to above will guide you in building a more robust fashion classification system.

I hope you enjoyed today’s post!

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

16 Responses to Fashion MNIST with Keras and Deep Learning

  1. David Bonn February 11, 2019 at 6:03 pm #

    Hi Adrian,

    Another great blog post!

    I’m finding this script and others very helpful, not just for the nuts-and-bolts technical benefit, but because they are very useful for exercising my new Keras+tensorflow+gpu installation and verifying that it is working more-or-less correctly.

    Normally I wouldn’t feel the need (although I’m concluding that this kind of verification testing is a Good Idea) but I needed to install a later release of CUDA &c and tensorflow to be compatible with RTX GPUs. So I wanted some reassurance that things were more-or-less working before I start working on my buggy code.

    So far so good.

    When I think about it, a Keras Test Suite would be a most excellent project.

    • Adrian Rosebrock February 11, 2019 at 7:11 pm #

      Thanks David! Good luck configuring your deep learning machine 🙂

  2. Anmol Darak February 11, 2019 at 7:12 pm #

    Will this code run on a 16GB RAM Macbook pro?

    • Adrian Rosebrock February 14, 2019 at 1:25 pm #

      Yes, although one of the best parts of learning computer vision and deep learning is experimenting. Jump in! Download the code and give it a try.

  3. Atul February 12, 2019 at 5:56 am #

    Hi Adrian,
    Great post again …..and helpful in implementing the same 🙂
    I would love if you can cover any of the following problem statements in upcoming blogs :

    1. Identify pests or diseases on plants
    2. Identify weight of animals (cow, goat or sheep) by processing its image

    I am not very sure how much easy or difficult to solve these problems, but even if you can discuss challenges then that would also be great.
    Thanks again,

  4. Brian Meehan February 12, 2019 at 9:25 am #

    Great example and very well explained code. Thank you Adrian.. I just need how to get my own images and labels into the model and then how to deploy the learnt model.

    • Adrian Rosebrock February 14, 2019 at 1:16 pm #

      Take a look at this tutorial which will teach you how to train your own NNs with Keras.

  5. Ayşe July 3, 2019 at 5:56 pm #

    Great tutorial.How can I see the dataset, and create my own dataset similar to that dataset, do you have tutorial for that purpose?
    Thanks in advance

  6. Emir July 10, 2019 at 4:56 pm #

    Hi Adrian
    Thanks for the tutorial
    I want to train a CNN with cifar-10. Is MiniVGGNet deep enough to get >90% accuracy? I tried it and did many experiments but 80% was the best i could get.

    • Adrian Rosebrock July 25, 2019 at 10:19 am #

      Hey Emir — Deep learning for Computer Vision with Python will teach you how to obtain > 90% accuracy on CIFAR-10. It will also teach you my tips, suggestions, and best practices when training CNNs as well. Give it a read, I believe it will help you quite a bit.

  7. Walid August 21, 2019 at 5:41 pm #

    Thanks a lot
    Can you please explain the role of “np.newaxis”

    in probs = model.predict(testX[np.newaxis, i]) ?

  8. Joaquim Augusto August 31, 2019 at 8:36 pm #

    Why this architecture is considered a miniVgg?

    • Adrian Rosebrock September 5, 2019 at 10:33 am #

      VGGNet uses only 3×3 CONV filters with multiple CONVs stacked on top of each other prior to pooling. This is a smaller implementation of a VGG-like architecture, hence MiniVGGNet.

  9. Rahul gupta November 19, 2019 at 9:15 am #

    How to reduce the size of images from 28*28 to some lower dimension e.g. 20*20 .

    • Adrian Rosebrock November 21, 2019 at 9:07 am #

      I would suggest using the “cv2.resize” function to resize the images.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply