Keras ImageDataGenerator and Data Augmentation

In today’s tutorial, you will learn how to use Keras’ ImageDataGenerator class to perform data augmentation. I’ll also dispel common confusions surrounding what data augmentation is, why we use data augmentation, and what it does/does not do.

Knowing that I was going to write a tutorial on data augmentation, two weekends ago I decided to have some fun and purposely post a semi-trick question on my Twitter feed.

The question was simple — data augmentation does which of the following?

  1. Adds more training data
  2. Replaces training data
  3. Does both
  4. I don’t know

Here are the results:

Figure 1: My @PyImageSearch twitter poll on the concept of Data Augmentation.

Only 5% of respondents answered this trick question “correctly” (at least if you’re using Keras’ ImageDataGenerator class).

Again, it’s a trick question so that’s not exactly a fair assessment, but here’s the deal:

While the word “augment” means to make something “greater” or “increase” something (in this case, data), the Keras ImageDataGenerator class actually works by:

  1. Accepting a batch of images used for training.
  2. Taking this batch and applying a series of random transformations to each image in the batch (including random rotation, resizing, shearing, etc.).
  3. Replacing the original batch with the new, randomly transformed batch.
  4. Training the CNN on this randomly transformed batch (i.e., the original data itself is not used for training).

That’s right — the Keras ImageDataGenerator class is not an “additive” operation. It’s not taking the original data, randomly transforming it, and then returning both the original data and transformed data.

Instead, the ImageDataGenerator accepts the original data, randomly transforms it, and returns only the new, transformed data.

But remember how I said this was a trick question?

Technically, all the answers are correct — but the only way you know if a given definition of data augmentation is correct is via the context of its application.

I’ll help you clear up some of the confusion regarding data augmentation (and give you the context you need to successfully apply it).

Inside the rest of today’s tutorial you will:

  • Learn about three types of data augmentation.
  • Dispel any confusion you have surrounding data augmentation.
  • Learn how to apply data augmentation with Keras and the ImageDataGenerator  class.

To learn more about data augmentation, including using Keras’ ImageDataGenerator class, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras ImageDataGenerator and Data Augmentation

We’ll start this tutorial with a discussion of data augmentation and why we use it.

I’ll then cover the three types of data augmentation you’ll see when training deep neural networks:

  1. Dataset generation and data expansion via data augmentation (less common)
  2. In-place/on-the-fly data augmentation (most common)
  3. Combining dataset generation and in-place augmentation

From there I’ll teach you how to apply data augmentation to your own datasets (using all three methods) using Keras’ ImageDataGenerator  class.

What is data augmentation?

Data augmentation encompasses a wide range of techniques used to generate “new” training samples from the original ones by applying random jitters and perturbations (but at the same time ensuring that the class labels of the data are not changed).

Our goal when applying data augmentation is to increase the generalizability of the model.

Given that our network is constantly seeing new, slightly modified versions of the input data, the network is able to learn more robust features.

At testing time we do not apply data augmentation and simply evaluate our trained network on the unmodified testing data — in most cases, you’ll see an increase in testing accuracy, perhaps at the expense of a slight dip in training accuracy.

A simple data augmentation example

Figure 2: Left: A sample of 250 data points that follow a normal distribution exactly. Right: Adding a small amount of random “jitter” to the distribution. This type of data augmentation increases the generalizability of our networks.

Let’s consider Figure 2 (left) of a normal distribution with zero mean and unit variance.

Training a machine learning model on this data may result in us modeling the distribution exactly — however, in real-world applications, data rarely follows such a nice, neat distribution.

Instead, to increase the generalizability of our classifier, we may first randomly jitter points along the distribution by adding some random values \epsilon drawn from a random distribution (right).

Our plot still follows an approximately normal distribution, but it’s not a perfect distribution as on the left.

A model trained on this modified, augmented data is more likely to generalize to example data points not included in the training set.

Computer vision and data augmentation

Figure 3: In computer vision, data augmentation performs random manipulations on images. It is typically applied in three scenarios discussed in this blog post.

In the context of computer vision, data augmentation lends itself naturally.

For example, we can obtain augmented data from the original images by applying simple geometric transforms, such as random:

  1. Translations
  2. Rotations
  3. Changes in scale
  4. Shearing
  5. Horizontal (and in some cases, vertical) flips

Applying a (small) amount of the transformations to an input image will change its appearance slightly, but it does not change the class label — thereby making data augmentation a very natural, easy method to apply for computer vision tasks.

Three types of data augmentation

There are three types of data augmentation you will likely encounter when applying deep learning in the context of computer vision applications.

Exactly which definition of data augmentation is “correct” is entirely dependent on the context of your project/set of experiments.

Take the time to read this section carefully as I see many deep learning practitioners confuse what data augmentation does and does not do.

Type #1: Dataset generation and expanding an existing dataset (less common)

Figure 4: Type #1 of data augmentation consists of dataset generation/dataset expansion. This is a less common form of data augmentation.

The first type of data augmentation is what I call dataset generation or dataset expansion.

As you know machine learning models, and especially neural networks, can require quite a bit of training data — but what if you don’t have very much training data in the first place?

Let’s examine the most trivial case where you only have one image and you want to apply data augmentation to create an entire dataset of images, all based on that one image.

To accomplish this task, you would:

  1. Load the original input image from disk.
  2. Randomly transform the original image via a series of random translations, rotations, etc.
  3. Take the transformed image and write it back out to disk.
  4. Repeat steps 2 and 3 a total of N times.

After performing this process you would have a directory full of randomly transformed “new” images that you could use for training, all based on that single input image.

This is, of course, an incredibly simplified example.

You more than likely have more than a single image — you probably have 10s or 100s of images and now your goal is to turn that smaller set into 1000s of images for training.

In those situations, dataset expansion and dataset generation may be worth exploring.

But there’s a problem with this approach — we haven’t exactly increased the ability of our model to generalize.

Yes, we have increased our training data by generating additional examples, but all of these examples are based on a super small dataset.

Keep in mind that our neural network is only as good as the data it was trained on.

We cannot expect to train a NN on a small amount of data and then expect it to generalize to data it was never trained on and has never seen before.

If you find yourself seriously considering dataset generation and dataset expansion, you should take a step back and instead invest your time gathering additional data or looking into methods of behavioral cloning (and then applying the type of data augmentation covered in the “Combining dataset generation and in-place augmentation” section below).

Type #2: In-place/on-the-fly data augmentation (most common)

Figure 5: Type #2 of data augmentation consists of on-the-fly image batch manipulations. This is the most common form of data augmentation with Keras.

The second type of data augmentation is called in-place data augmentation or on-the-fly data augmentation. This type of data augmentation is what Keras’ ImageDataGenerator  class implements.

Using this type of data augmentation we want to ensure that our network, when trained, sees new variations of our data at each and every epoch.

Figure 5 demonstrates the process of applying in-place data augmentation:

  1. Step #1: An input batch of images is presented to the ImageDataGenerator .
  2. Step #2: The ImageDataGenerator  transforms each image in the batch by a series of random translations, rotations, etc.
  3. Step #3: The randomly transformed batch is then returned to the calling function.

There are two important points that I want to draw your attention to:

  1. The ImageDataGenerator  is not returning both the original data and the transformed data — the class only returns the randomly transformed data.
  2. We call this “in-place” and “on-the-fly” data augmentation because this augmentation is done at training time (i.e., we are not generating these examples ahead of time/prior to training).

When our model is being trained, we can think of our ImageDataGenerator  class as “intercepting” the original data, randomly transforming it, and then returning it to the neural network for training, all the while the NN has no idea the data was modified!

I’ve written previous tutorials on the PyImageSearch blog where readers think that Keras’ ImageDateGenerator class is an “additive operation”, similar to the following (incorrect) figure:

Figure 6: How Keras data augmentation does not work.

In the above illustration the ImageDataGenerator  accepts an input batch of images, randomly transforms the batch, and then returns both the original batch and modified data — again, this is not what the Keras ImageDataGenerator  does. Instead, the ImageDataGenerator  class will return just the randomly transformed data.

When I explain this concept to readers the next question is often:

But Adrian, what about the original training data? Why is it not used? Isn’t the original training data still useful for training?

Keep in mind that the entire point of the data augmentation technique described in this section is to ensure that the network sees “new” images that it has never “seen” before at each and every epoch.

If we included the original training data along with the augmented data in each batch, then the network would “see” the original training data multiple times, effectively defeating the purpose. Secondly, recall that the overall goal of data augmentation is to increase the generalizability of the model.

To accomplish this goal we “replace” the training data with randomly transformed, augmented data.

In practice, this leads to a model that performs better on our validation/testing data but perhaps performs slightly worse on our training data (to due to the variations in data caused by the random transforms).

You’ll learn how to use the Keras ImageDataGenerator  class later in this tutorial.

Type #3: Combining dataset generation and in-place augmentation

The final type of data augmentation seeks to combine both dataset generation and in-place augmentation — you may see this type of data augmentation when performing behavioral cloning.

A great example of behavioral cloning can be seen in self-driving car applications.

Creating self-driving car datasets can be extremely time consuming and expensive — a way around the issue is to instead use video games and car driving simulators.

Video game graphics have become so life-like that it’s now possible to use them as training data.

Therefore, instead of driving an actual vehicle, you can instead:

  • Play a video game
  • Write a program to play a video game
  • Use the underlying rendering engine of the video game

…all to generate actual data that can be used for training.

Once you have your training data you can go back and apply Type #2 data augmentation (i.e., in-place/on-the-fly data augmentation) to the data you gathered via your simulation.

Project structure

Before we dive into the code let’s first review our directory structure for the project:

First, there are two dataset directories which are not to be confused:

  • dogs_vs_cats_small/ : A subset of the popular Kaggle Dogs vs. Cats competition dataset. In my curated subset, only 2,000 images (1,000 per class) are present (as opposed to the 25,000 images for the challenge).
  • generated_dataset/ : We’ll create this generated dataset using the cat.jpg  and dog.jpg  images which are in the parent directory. We’ll utilize data augmentation Type #1 to generate this dataset automatically and fill this directory with images.

Next, we have our pyimagesearch  module which contains our implementation of the ResNet CNN classifier.

Today we’ll review two Python scripts:

  • train.py : Used to train models for both Type #1 and Type #2 (and optionally Type #3 if the user so wishes) data augmentation techniques. We’ll perform three training experiments resulting in each of the three plot*.png  files in the project folder.
  • generate_images.py : Used to generate a dataset from a single image using Type #1.

Let’s begin.

Implementing our training script

In the remainder of this tutorial we’ll be performing three experiments:

  1. Experiment #1: Generate a dataset via dataset expansion and train a CNN on it.
  2. Experiment #2: Use a subset of the Kaggle Dogs vs. Cats dataset and train a CNN without data augmentation.
  3. Experiment #3: Repeat the second experiment, but this time with data augmentation.

All of these experiments will be accomplished using the same Python script.

Open up the train.py  script and let’s get started:

On Lines 2-18 our necessary packages are imported. Line 10 is our ImageDataGenerator  import from the Keras library — a class for data augmentation.

Let’s go ahead and parse our command line arguments:

Our script accepts three command line arguments via the terminal:

  • --dataset : The path to the input dataset.
  • --augment : Whether “on-the-fly” data augmentation should be used (refer to type #2 above). By default, this method is not performed.
  • --plot : The path to the output training history plot.

Let’s proceed to initialize hyperparameters and load our image data:

Training hyperparameters, including initial learning rate, batch size, and number of epochs to train for, are initialized on Lines 32-34.

From there Lines 39-53 grab imagePaths , load images, and populate our data  and labels  lists. The only image preprocessing we perform at this point is to resize each image to 64×64px.

Next, let’s finish preprocessing, encode our labels, and partition our data:

On Line 57, we convert data to a NumPy array as well as scale all pixel intensities to the range [0, 1]. This completes our preprocessing.

From there we perform “one-hot encoding” of our labels  (Lines 61-63). This method of encoding our labels  results in an array that may look like this:

For this sample of data, there are two cats ( [1., 0.] ) and five dogs ( [0., 1] ) where the label corresponding to the image is marked as “hot”.

From there we partition our data  into training and testing splits marking 75% of our data for training and the remaining 25% for testing (Lines 67 and 68).

Now, we are ready to initialize our data augmentation object:

Line 71 initializes our empty data augmentation object (i.e., no augmentation will be performed). This is the default operation of this script.

Let’s check if we’re going to override the default with the --augment  command line argument:

Line 75 checks to see if we are performing data augmentation. If so, we re-initialize the data augmentation object with random transformation parameters (Lines 77-84). As the parameters indicate, random rotations, zooms, shifts, shears, and flips will be performed during in-place/on-the-fly data augmentation.

Let’s compile and train our model:

Lines 88-92 construct our ResNet  model using Stochastic Gradient Descent optimization and learning rate decay. We use "binary_crossentropy"  loss for this 2-class problem. If you have more than two classes, be sure to use "categorial_crossentropy" .

Lines 96-100 then train our model. The aug  object handles data augmentation in batches (although be sure to recall that the aug  object will only perform data augmentation if the  --augment  command line argument was set).

Finally, we’ll evaluate our model, print statistics, and generate a training history plot:

Line 104 makes predictions on the test set for evaluation purposes. A classification report is printed via Lines 105 and 106.

From there, Lines 109-120 generate and save an accuracy/loss training plot.

Generating a dataset/dataset expansion with data augmentation and Keras

In our first experiment, we will perform dataset expansion via data augmentation with Keras.

Our dataset will contain 2 classes and initially, the dataset will trivially contain only 1 image per class:

  • Cat: 1 image
  • Dog: 1 image

We’ll utilize Type #1 data augmentation (see the “Type #1: Dataset generation and expanding an existing dataset” section above) to generate a new dataset with 100 images per class:

  • Cat: 100 images
  • Dog: 100 images

Again, this meant to be an example — in a real-world application you would have 100s of example images, but we’re keeping it simple here so you can learn the concept.

Generating the example dataset

Figure 7: Data augmentation with Keras performs random manipulations on images.

Before we can train our CNN we first need to generate an example dataset.

From our “Project Structure” section above you know that we have two example images in our root directory: cat.jpg and dog.jpg. We will use these example images to generate 100 new training images per class (200 images in total).

To see how we can use data augmentation to generate new examples, open up the generate_images.py  file and follow along:

Lines 2-6 import our necessary packages. Our ImageDataGenerator  is imported on Line 2 and will handle our data augmentation with Keras.

From there, we’ll parse three command line arguments:

  • --image : The path to the input image. We’ll generate additional random, mutated versions of this image.
  • --output : The path to the output directory to store the data augmentation examples.
  • --total : The number of sample images to generate.

Let’s go ahead and load our image  and initialize our data augmentation object:

Our image  is loaded and prepared for data augmentation via Lines 21-23. Image loading and processing is handled via Keras functionality (i.e. we aren’t using OpenCV).

From there, we initialize the ImageDataGenerator  object. This object will facilitate performing random rotations, zooms, shifts, shears, and flips on our input image.

Next, we’ll construct a Python generator and put it to work until all of our images have been produced:

We will use the imageGen  to randomly transform the input image (Lines 39 and 40). This generator saves images as .jpg files to the specified output directory contained within args["output"] .

Finally, we’ll loop over examples from our image data generator and count them until we’ve reached the required total  number of images.

To run the generate_examples.py  script make sure you have used the “Downloads” section of the tutorial to download the source code and example images.

From there open up a terminal and execute the following command:

Check the output of the generated_dataset/cats  directory you will now see 100 images:

Let’s do the same now for the “dogs” class:

And now check for the dog images:

A visualization of the dataset generation via data augmentation can be seen in Figure 6 at the top of this section — notice how we have accepted a single input image (of me — not of a dog or cat) and then created 100 new training examples (48 of which are visualized) from that single image.

Experiment #1: Dataset generation results

We are now ready to perform our first experiment:

Figure 8: Data augmentation with Keras Experiment #1 training accuracy/loss results.

Our results show that we were able to obtain 100% accuracy with little effort.

Of course, this is a trivial, contrived example. In practice, you would not be taking only a single image and then building a dataset of 100s or 1000s of images via data augmentation. Instead, you would have a dataset of 100s of images and then you would apply dataset generation to that dataset — but again, the point of this section was to demonstrate on a simple example so you could understand the process.

Training a network with in-place data augmentation

The more popular form of (image-based) data augmentation is called in-place data augmentation (see the “Type #2: In-place/on-the-fly data augmentation” section of this post for more details).

When performing in-place augmentation our Keras ImageDataGenerator  will:

  1. Accept a batch of input images.
  2. Randomly transform the input batch.
  3. Return the transformed batch to the network for training.

We’ll explore how data augmentation can reduce overfitting and increase the ability of our model to generalize via two experiments.

To accomplish this task we’ll be using a subset of the Kaggle Dogs vs. Cats dataset:

  • Cats: 1,000 images
  • Dogs: 1,000 images

We’ll then train a variation of ResNet, from scratch, on this dataset with and without data augmentation.

Experiment #2: Obtaining a baseline (no data augmentation)

In our first experiment we’ll perform no data augmentation:

Looking at the raw classification report you’ll see that we’re obtaining 64% accuracybut there’s a problem!

Take a look at the plot associated with our training:

Figure 9: For Experiment #2 we did not perform data augmentation. The result is a plot with strong indications of overfitting.

There is dramatic overfitting occurring — at approximately epoch 15 we see our validation loss start to rise while training loss continues to fall. By epoch 20 the rise in validation loss is especially pronounced.

This type of behavior is indicative of overfitting.

The solution is to (1) reduce model capacity, and/or (2) perform regularization.

Experiment #3: Improving our results (with data augmentation)

Let’s now investigate how data augmentation can act as a form of regularization:

We’re now up to 68% accuracy, an increase from our previous 64% accuracy.

But more importantly, we are no longer overfitting:

Figure 10: For Experiment #3, we performed data augmentation with Keras on batches of images in-place. Our training plot shows no signs of overfitting with this form of regularization.

Note how validation and training loss are falling together with little divergence. Similarly, classification accuracy for both the training and validation splits are growing together as well.

By using data augmentation we were able to combat overfitting!

In nearly all situations, unless you have very good reason not to, you should be performing data augmentation when training your own neural networks.

What’s next?

Figure 11: Deep Learning for Computer Vision with Python is the book I wish I had when I was getting started in the field of deep learning a number of years ago.

If you’d like to learn more about data augmentation, including:

  1. More details on the concept of data augmentation.
  2. How to perform data augmentation on your own datasets.
  3. Other forms of regularization to improve your model accuracy.
  4. My tips/tricks, suggestions, and best practices for training CNNs.

…then you’ll definitely want to refer to Deep Learning for Computer Vision with Python.

Data augmentation is just one of the sixty-three chapters in the book. You’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image classification, object detection, and segmentation.

To learn more about the book, and to grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial, you learned about data augmentation and how to apply data augmentation via Keras’ ImageDataGenerator class.

You also learned about three types of data augmentation, including:

  1. Dataset generation and data expansion via data augmentation (less common).
  2. In-place/on-the-fly data augmentation (most common).
  3. Combining the dataset generator and in-place augmentation.

By default, Keras’ ImageDataGenerator  class performs in-place/on-the-fly data augmentation, meaning that the class:

  1. Accepts a batch of images used for training.
  2. Takes this batch and applies a series of random transformations to each image in the batch.
  3. Replaces the original batch with the new, randomly transformed batch
  4. 4. Trains the CNN on this randomly transformed batch (i.e., the original data itself is not used for training).

All that said, we actually can take the ImageDataGenerator  class and use it for dataset generation/expansion as well — we just need to use it to generate our dataset before training.

The final method of data augmentation, combining both in-place and dataset expansion, is rarely used. In those situations, you likely have a small dataset, need to generate additional examples via data augmentation, and then have an additional augmentation/preprocessing at training time.

We wrapped up the guide by performing a number of experiments with data augmentation, noting that data augmentation is a form of regularization, enabling our network to generalize better to our testing/validation set.

This claim of data augmentation as regularization was verified in our experiments when we found that:

  1. Not applying data augmentation at training caused overfitting
  2. While apply data augmentation allowed for smooth training, no overfitting, and higher accuracy/lower loss

You should apply data augmentation in all of your experiments unless you have a very good reason not to.

To learn more about data augmentation, including my best practices, tips, and suggestions, be sure to take a look at my book, Deep Learning for Computer Vision with Python.

I hope you enjoyed today’s tutorial!

To download the source code to this post (and receive email updates when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , ,

47 Responses to Keras ImageDataGenerator and Data Augmentation

  1. Nikesh July 8, 2019 at 11:36 am #

    Hi Adrian,

    This article is awesome. It was a great selection of topic by you and the twitter poll says it all that most of the people were in misconception including me that’s why it totally astonished me. But at the end I have a couple of questions:

    For keras ImageDataGenerator() class, if we provide a batch of size 32 (images) from original dataset and suppose it apply 5 transformations per image then
    1. Will it train on 160 images for that batch? If yes, then what happens in subsequent epochs?
    2. Does on-the-fly data augmentation discards the generated images after training?

    • Adrian Rosebrock July 8, 2019 at 12:06 pm #

      Hey Nikesh,

      1. you should go back and re-read the “Type #2: In-place/on-the-fly data augmentation (most common)” section. If you use the ImageDataGenerator class with a batch size of 32, you’ll put 32 images into the object and get 32 randomly transformed images back out. You will NOT have 160 images. The same size batch comes out, only with the randomly transformed images.

      2. Yes, all data augmentation operations are done in memory so the generated images are discarded.

      • Nikesh July 9, 2019 at 9:58 am #

        Ok so the ImageDataGenerator() object returns 32 images by applying random transformations which is used to train. In the next epoch there are 32 differently transformed images for the same batch. Am I getting it correct?

        Does this introduces some kind of regularization too?

        • Adrian Rosebrock July 10, 2019 at 8:12 am #

          1. Your understanding is correct.

          2. See the post. I mention multiple times how image augmentation IS a form of regularization.

  2. Peter Shanks July 8, 2019 at 7:36 pm #

    Hi Adrian,

    I’ve been dabbling with CNNs for a while now, and really enjoy reading your posts.

    Recently I’ve started building an image classifier for a collection of plankton images that we have here where I work, and it struck me that my data could be made more useful with another form of augmentation: defining where the joints and shell sections of the creatures are (and possibly how many degrees of rotation might be allowed for each joint) and then generating new images by randomly manipulating the exoskeleton.

    Have you come across anything that does this or am I better off doing a crash course in object detection and Blender scripting?

    • Adrian Rosebrock July 10, 2019 at 8:11 am #

      You might want to take a look at the imgaug library. It provides WAY more options for augmentation than Keras’ standard ImageDataGenerator class — it might have something in the implementation that will help you.

  3. Rémi July 9, 2019 at 3:58 am #

    Hey Adrian,

    As usual, very good tutorial. I was among people who didn’t know that the original batch is not used after data augmentation … Thx for the information !

    I was wondering: is there a way to display the new batch after data augmentation ? Like:

    for img,label in augmented_data:
    cv2.imshow(“XXX”, img)
    cv2.waitKey()

    It could be an interesting point, for instance, when I rescale MNIST dataset, I don’t want to zoom too much and generate inoperable images !

    Have a good day 🙂

    • Adrian Rosebrock July 10, 2019 at 8:10 am #

      You don’t get to see the batch before it’s passed into the network for training. Displaying the image would wouldn’t help training and it would only slow it down due to the I/O operations. If you want to visualize the output of data augmentation see the generate_examples.py script.

      • Rémi July 11, 2019 at 8:39 am #

        Adrian,

        I think that you did not understand my question at first time ^^. I do not want to see batches during training but before launching my training, so that I will be able to say “Ok Rémi, digits are over-cropped, you put scale parameter too high, it won’t be pertinent to send that type of “over-zoomed” pictures to your CNN”
        Your second answer is what I was expecting, I will check out !

        Thanks

  4. Gautam Kumar July 10, 2019 at 7:26 am #

    Hi Adrian,
    Thanks for explaining the data augmentation clearly. Even, when I answered your question, I found myself in the category of 65% of people. But now I got your point. Well, I was working on iris recognition system using CNN with a limited number of images in each class ( 5 to 10) and I was really confused whether I should use data augmentation or not, as iris images comes with close capture of the ocular region and use of data augmentation will translate and Shear the eye shape. Can you please suggest whether I should use data augmentation for iris recognition or not?

    • Adrian Rosebrock July 10, 2019 at 8:13 am #

      With only 5-10 example images you should instead invest your time gathering additional training data instead of trying to perform data augmentation. 5-10 images realistically isn’t enough.

      • Gautam Kumar July 11, 2019 at 2:51 am #

        Thanks for your quick reply, actually I am using UBIRIS.v1 which is well known and standard dataset. I can’t add images in this database. Well, I wanted to know, if I perform data augmentation on these images, it would change/reshape image due to the random transformation that may cause the iris region (which is important for my experiment) may get cropped or partially available for feature extraction. It would be helpful for me if you suggest or write blogs stating in which case and with what type of images data augmentation is a helpful approach for training the network and where it should not be used. Thanks

  5. Nurhan July 10, 2019 at 2:33 pm #

    Hi Adrian,

    thank you very much for this nice tutorial!

    I have two questions:

    1. does the ImageDataGenerator also augment the annotations of images? for example the masks of Mask-RCNN in JSON?

    2. is the imgaug library able to generate as well augmented images and save them on the fly?

    • Adrian Rosebrock July 15, 2019 at 1:12 pm #

      No, you would want to use imgaug for annotation of object detection, instance segmentation, etc.

  6. Denys July 11, 2019 at 10:10 am #

    Hi Adrian, thanks for this article. I wish I saw this article before when I was starting ML.
    I was developing a mobile app Frutolo. It is for the recognition of fruits and vegetables.
    I was really limited about the number of training images. So, that’s how I started with image augmentation.
    It is especially important for such applications, where the shapes and colours of the objects are very similar.

    • Adrian Rosebrock July 15, 2019 at 1:11 pm #

      Thanks Denys, I’m glad you enjoyed the article. And congrats on building your DL application 🙂

  7. Patrick Ryan July 11, 2019 at 8:41 pm #

    Hi Adrian
    Thank you for this great article and all of your articles. I am big fan of your work and I have read the python and opencv book – excellent!, I am working through your Deep Learning Book which is fantastic! and I signed up for the Raspberry PI kickstarter.

    My question is whether image data generation and augmentation makes sense when creating a training set for facial recognition. If I understand correctly, facial recognition tries to normalize a face (straight on, no rotation, etc ) so if I create an augmented set will the facial recognition software just try to undo the small transformations?

    I did try it with my family members LinkedIn pictures. I took the one profile picture from LinkedIn, generated 30 from the one picture, then used the techniques you describe in this post:

    https://www.pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/

    and the face_recognition package was able to recognize each person. I know you are busy, thank you for taking time. I would love your thoughts on this approach.

    Again – thank you.

    • Adrian Rosebrock July 15, 2019 at 1:10 pm #

      It won’t undo it per se, but it will re-align the face. It’s still valuable to apply data augmentation to face recognition.

  8. M Sudhakara July 12, 2019 at 1:19 am #

    Hi Adrian,

    Can I use Color transformations as one of data augmentation? For example, I have RGB color image. If I convert this RGB to HSV, LAB we get the same image in other color spaces.

    Thes converted images can act as a data augmentation?

    • Adrian Rosebrock July 15, 2019 at 1:09 pm #

      You typically wouldn’t change the color space and use JUST the modified color space for your training data. You pick a color space for training and stick with that color space. You may decide to adjust contrast, brightness, etc. For that take a look at the imgaug library.

  9. monty July 12, 2019 at 2:45 am #

    why doesn’t such a big network identify that images are linear combination of each other? or images are derived from other images.

  10. Xu Zhang July 12, 2019 at 6:42 pm #

    Thank you so much for your great post. Finally, I understood how augmentation works.
    For Keras ImageDataGenerator(), it can do in-place/on-the-fly data augmentation for 2D image with 3 channels. Can it do 3D with more than 3 channels augmentation on the fly? If not, whether do there some implementations exist? Many thanks

    • Adrian Rosebrock July 15, 2019 at 1:08 pm #

      I’m not sure about that one. You may want to look into the “imgaug” library. That library might be able to help you out more.

    • asere August 6, 2019 at 10:55 am #

      Also interested in this, were you able to find any existing implementations or did dataug library work for you in 3D images?

  11. KV Subbaiah Setty July 12, 2019 at 9:03 pm #

    Hello Adrian,

    I am thinking of a trick ,since in ” #type two” data augmentation the original images dataset is not seen by the model during training ( all train examples are transformed images) , can we use the original images ( original train set ) as validation data set and there is no need of train- test split ,as all all original images are unseen images during training. What I am thinking is something like “out-of-bag” samples ( oob sampling) usually done in bagging .

    What is your opinion and thoughts on this ? Is my thinking is correct?.

    • Adrian Rosebrock July 15, 2019 at 1:07 pm #

      No, don’t do that. Keep your training, testing, and validation sets entirely independent and sequestered from each other. Trying to mix them could increase the chance of overfitting and if you were to publish a paper it could even invalidate your results. Keep them separate.

  12. Nurhan July 13, 2019 at 11:20 am #

    Hey Adrian,

    How to use data augmentation with YOLO and SSD? Where to put the data augmentation there?

  13. Future_Vision July 22, 2019 at 6:24 pm #

    Hi Adrian,
    I’m really happy I found your blog. Much of what I am trying to accomplish you cover but I am running into an issue with data augmentation. I’m just not sure what the right approach is. Here is my high level explanation of my project. I have a set of 52 playing cards of a specific design. Since finding images of this design is very difficult my thinking was to generate images for each of the 52 cards(classes). I have 3 directories(train, validate, test) and each of those there is a directory for each card. I have an image of the original, unadulterated, art for each card sitting in each of those directories. I’ve been able to generate images for a single image but only certain transformations work but that is probably another question. Just trying to figure out how to batch generate images for all 52 classes without manually doing it for each class in each set(train, validate and test). Any guidance you can give me? I’d greatly appreciate it!

    • Adrian Rosebrock July 25, 2019 at 9:26 am #

      Hey there — have you taken a look at Deep Learning for Computer Vision with Python? That book covers my tips, suggestions, and best practices for how to build your own datasets and generators. All of my guidance and suggestions are covered in that book. Definitely take a look if you’re serious about your project.

      • Future_Vision July 25, 2019 at 10:10 am #

        Thanks for the feedback. This is just a class project I am a little stumped on so I’ll need to pass on buying the book.

        • Adrian Rosebrock July 25, 2019 at 10:20 am #

          Good luck with the class project! I hope it turns out well.

  14. Joe July 26, 2019 at 11:16 am #

    Hi Adrian,

    Well, I don’t need to say how important this blog has been to my short ML life. Thank you.

    I’ve got a question, a dummy question, you know.
    I am new in python. I wonder if you could me explain this part:
    label = imagePath.split(os.path.sep)[-2]

    Thank you

  15. Walid August 2, 2019 at 10:40 am #

    Thanks a lot. I appreciate all the effory you put in this post

    I have a question not related directly to the post, Why you always prefer to have a last layer with 2 neurons while you are doing binary classiffier.?
    Would not a single neuron do the trick?

    Best Reagrds,

    Walid

  16. Sammy August 9, 2019 at 4:23 am #

    Hi Adrian,

    What code could you use if you wanted to include both the original images and the augmented images in training?

    • Adrian Rosebrock August 16, 2019 at 6:00 am #

      You would need to implement your own custom data generator and inside the generator have it apply data augmentation to the images. Then, concatenate the original images with the augmented images and yield it.

  17. Pankaj August 16, 2019 at 6:46 am #

    Hey Adrain,
    I am fan your style, Now I have started following you through this blog.
    Have a look at this thread: https://www.kaggle.com/questions-and-answers/96043#latest-555722

    • Adrian Rosebrock September 5, 2019 at 9:59 am #

      Thanks Pankaj, I’m happy the tutorial helped you 🙂

  18. Ashish September 4, 2019 at 2:32 am #

    Hi Adrian,
    Can you please help how we can implement “ImageDataGenerator” and “Data Augmentation” on multi-label image classification ?Where one image can contains multiple class.

  19. Adarsh September 7, 2019 at 3:53 am #

    Is there a crude way to estimate how many sample images we need for training for a given number of classes. Or, is it that we should keep trying out until we edge on required accuracy.

  20. kaki October 2, 2019 at 2:55 pm #

    Hi Adrian, this was a lot of help! I’ve got one question:

    If we use method #2 the in-place augmentation in keras, does this mean our validation split of 0.25 in the train_test_split call will remain true even after augmenting the training data?

    • Adrian Rosebrock October 3, 2019 at 12:18 pm #

      I’m not sure I fully understand your question. You don’t apply data augmentation to your validation set. And furthermore, your validation set should always be separate from your training set. You should never mix the two.

      • Kaki October 3, 2019 at 8:23 pm #

        Sorry let me clarify. Does they 0.25 split ratio still stand even after we’ve split the training and validation sets AND then augmented the training data using method #2?

  21. Pand October 4, 2019 at 9:14 am #

    Hey Adrian! One question: I have a model which inputs an image and outputs 5 arrays (heatmaps). Is there a way to adapt the outputs accordingly with the transformation each input image received? Thanks and great work!!

  22. Eduard Bulava October 14, 2019 at 9:11 am #

    Hello everyone, I trained the model based on Adrian’s tutorials. Unfortunately, now my model predicts the same result despite the incoming data. Spent a lot of time trying to figure out what I did wrong. Could someone point out my mistake? I will be very grateful.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply

[email]
[email]