Fine-tuning with Keras and Deep Learning

In this tutorial, you will learn how to perform fine-tuning with Keras and Deep Learning.

We will take a CNN pre-trained on the ImageNet dataset and fine-tune it to perform image classification and recognize classes it was never trained on.

Today is the final post in our three-part series on fine-tuning:

  1. Part #1: Transfer learning with Keras and Deep Learning
  2. Part #2: Feature extraction with on large datasets with Keras and Deep Learning
  3. Part #3: Fine-tuning with Keras and Deep Learning (today’s post)

I would strongly encourage you to read the previous two tutorials in the series if you haven’t yet — understanding the concept of transfer learning, including performing feature extraction via a pre-trained CNN, will better enable you to understand (and appreciate) fine-tuning.

When performing feature extraction we did not re-train the original CNN. Instead, we treated the CNN as an arbitrary feature extractor and then trained a simple machine learning model on top of the extracted features.

Fine-tuning, on the other hand, requires that we not only update the CNN architecture but also re-train it to learn new object classes.

Fine-tuning is a multi-step process:

  1. Remove the fully connected nodes at the end of the network (i.e., where the actual class label predictions are made).
  2. Replace the fully connected nodes with freshly initialized ones.
  3. Freeze earlier CONV layers earlier in the network (ensuring that any previous robust features learned by the CNN are not destroyed).
  4. Start training, but only train the FC layer heads.
  5. Optionally unfreeze some/all of the CONV layers in the network and perform a second pass of training.

If you are new to deep learning and CNNs, I would recommend you stop here and learn how to train your first CNN.

Fine-tuning with Keras is a more advanced technique with plenty of gotchas and pitfalls that will trip you up along the way (for example, it tends to be very easy to overfit a network when performing fine-tuning if you are not careful).

To learn how to perform fine-tuning with Keras and deep learning, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Fine-tuning with Keras and Deep Learning

Note: Many of the fine-tuning concepts I’ll be covering in this post also appear in my book, Deep Learning for Computer Vision with Python. Inside the book, I go into considerably more detail (and include more of my tips, suggestions, and best practices). If you would like more detail on fine-tuning with Keras after going through this guide, definitely take a look at my book.

In the first part of this tutorial, we’ll discuss the concept of fine-tuning and how we can re-train a neural network to recognize classes it was not originally trained to recognize.

From there we’ll review the dataset we are using for fine-tuning.

I’ll then discuss our project directory structure.

Once we have a good handle on the dataset we’ll then switch to implementing fine-tuning with Keras.

After you have finished going through this tutorial you will be able to:

  1. Fine-tune networks with Keras.
  2. Make predictions using the fine-tuned network.

Let’s get started!

What is fine-tuning?

Figure 1: Fine-tuning with Keras and deep learning using Python involves retraining the head of a network to recognize classes it was not originally intended for.

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning and fine-tuning, please refer to the text.

Earlier in this series of posts on transfer learning, we learned how to treat a pre-trained Convolutional Neural Network as a feature extractor.

Using this feature extractor, we forward propagated our dataset of images through the network, extracted the activations at a given layer (treating the activations as a feature vector), and then saved the values to disk.

A standard machine learning classifier (in our case, Logistic Regression), was trained on top of the CNN features, exactly as we would do with hand-engineered features such as SIFT, HOG, LBPs, etc.

This approach to transfer learning is called feature extraction.

But there is another type of transfer learning, one that can actually outperform the feature extraction method. This method is called fine-tuning and requires us to perform “network surgery”.

First, we take a scalpel and cut off the final set of fully connected layers (i.e., the “head” of the network where the class label predictions are returned) from a pre-trained CNN (typically VGG, ResNet, or Inception).

We then replace the head with a new set of fully connected layers with random initializations.

From there, all layers below the head are frozen so their weights cannot be updated (i.e., the backward pass in back propagation does not reach them).

We then train the network using a very small learning rate so the new set of fully connected layers can learn patterns from the previously learned CONV layers earlier in the network — this process is called allowing the FC layers to “warm up”.

Optionally, we may unfreeze the rest of the network and continue training. Applying fine-tuning allows us to utilize pre-trained networks to recognize classes they were not originally trained on.

And furthermore, this method can lead to higher accuracy than transfer learning via feature extraction.

Fine-tuning and network surgery

Note: The following section has been adapted from my book, Deep Learning for Computer Vision with Python. For the full set of chapters on transfer learning and fine-tuning, please refer to the text.

As we discussed earlier in this series on transfer learning via feature extraction, pre-trained networks (such as ones trained on the ImageNet dataset) contain rich, discriminative filters. The filters can be used on datasets to predict class labels outside the ones the network has already been trained on.

However, instead of simply applying feature extraction, we are going to perform network surgery and modify the actual architecture so we can re-train parts of the network.

If this sounds like something out of a bad horror movie; don’t worry, there won’t be any blood and gore — but we’ll have some fun and learn a lot about transfer learning via our Dr. Frankenstien-esque network experiments.

To understand how fine-tuning works, consider the following figure:

Figure 2: Left: The original VGG16 network architecture. Middle: Removing the FC layers from VGG16 and treating the final POOL layer as a feature extractor. Right: Removing the original FC Layers and replacing them with a brand new FC head. These FC layers can then be fine-tuned to a specific dataset (the old FC Layers are no longer used).

On the left we have the layers of the VGG116 network.

As we know, the final set of layers (i.e., the “head”) are our fully connected layers along with our softmax classifier.

When performing fine-tuning, we actually sever the head of the network, just as in feature extraction (Figure 2, middle).

However, unlike feature extraction, when we perform fine-tuning we actually build a new fully connected head and place it on top of the original architecture (Figure 2, right).

The new FC layer head is randomly initialized (just like any other layer in a new network) and connected to the body of the original network.

However, there is a problem:

Our CONV layers have already learned rich, discriminative filters while our FC layers are brand new and totally random.

If we allow the gradient to backpropagate from these random values all the way through the network, we risk destroying these powerful features.

To circumvent this problem, we instead let our FC head “warm up” by (ironically) “freezing” all layers in the body of the network (I told you the horror/cadaver analogy works well here) as depicted in Figure 2 (left).

Figure 3: Left: When we start the fine-tuning process, we freeze all CONV layers in the network and only allow the gradient to backpropagate through the FC layers. Doing this allows our network to “warm up”. Right: After the FC layers have had a chance to warm up, we may choose to unfreeze all or some of the layers earlier in the network and allow each of them to be fine-tuned as well.

Training data is forward propagated through the network as we usually would; however, the backpropagation is stopped after the FC layers, which allows these layers to start to learn patterns from the highly discriminative CONV layers.

In some cases, we may decide to never unfreeze the body of the network as our new FC head may obtain sufficient accuracy.

However, for some datasets it is often advantageous to allow the original CONV layers to be modified during the fine-tuning process as well (Figure 3, right).

After the FC head has started to learn patterns in our dataset, we can pause training, unfreeze the body, and continue training, but with a very small learning rate — we do not want to alter our CONV filters dramatically.

Training is then allowed to continue until sufficient accuracy is obtained.

Fine-tuning is a super-powerful method to obtain image classifiers on your own custom datasets from pre-trained CNNs (and is even more powerful than transfer learning via feature extraction).

If you’d like to learn more about transfer learning via deep learning, including:

  • Deep learning-based feature extraction
  • Training models on top of extracted features
  • Fine-tuning networks on your own custom datasets
  • My personal tips, suggestions, and best practices for transfer learning

…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover these algorithms and techniques in detail.

The Food-11 Dataset

Figure 4: The Food-11 dataset is curated by the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology. (image source)

The dataset we’ll be using for fine-tuning is the Food-11 dataset from the Multimedia Signal Processing Group (MSPG) of the Swiss Federal Institute of Technology.

The dataset consists of 16,643 images belonging to 11 major food categories:

  1. Bread (1724 images)
  2. Dairy product (721 images)
  3. Dessert (2,500 images)
  4. Egg (1,648 images)
  5. Fried food (1,461images)
  6. Meat (2,206 images)
  7. Noodles/pasta (734 images)
  8. Rice (472 images)
  9. Seafood (1,505 images)
  10. Soup (2,500 images)
  11. Vegetable/fruit (1,172 images)

Using the Food-11 dataset we can train a deep learning model capable of recognizing each major food group — such a model could be used, for example, in a mobile fitness application that automatically tracks estimated food group and caloric intake.

To train such a model, we’ll be utilizing fine-tuning with the Keras deep learning library.

Downloading the Food-11 dataset

Go ahead and grab the zip from the “Downloads” section of this blog post.

Once you’ve downloaded the source code, change directory into fine-tuning-keras :

Now let’s create a Food-11/  directory to house our unaltered dataset:

In my experience, I’ve found that downloading the Food-11 dataset is unreliable.

Therefore I’m presenting two options to download the dataset:

Option 1: Use wget  in your terminal

The wget  application comes pre-installed on Ubuntu and other Linux distros. On macOS, you must install it:

To download the Food-11 dataset, let’s use wget  in our terminal:

Note: At least on macOS, I’ve found that if the wget  command fails once, just run it again and then the download will start.

Option 2: Use FileZilla

FileZilla is a GUI application for FTP and SCP connections. You may download it for your OS here.

Once you’ve installed and launched the application, enter the credentials:

  • Host: tremplin.epfl.ch
  • Username: FoodImage@grebvm2.epfl.ch
  • Password: Cahc1moo

You can then connect and download the file into the appropriate destination.

Figure 5: Downloading the Food-11 dataset with FileZilla.

The username and password combination was obtained from the official Food-11 dataset website. If the username/password combination stops working for you, check to see if the dataset curators changed the login credentials.

Once downloaded (hopefully with no issues), we can go ahead and unzip the dataset inside of the Food-11/  directory:

Project structure

Now that we’ve downloaded the project and dataset, go ahead and navigate back to the project root. From there let’s analyze the project structure:

Our project structure is similar to last week’s.

Our original dataset is in the Food-11/  directory.

Executing build_dataset.py  enables us to organize the Food-11 images into the dataset/  directory.

From there, we’ll use train.py  to perform fine tuning.

Finally, we’ll use predict.py  to make predictions on sample images using our fine-tuned network.

Each of the aforementioned scripts takes advantage of a configuration file named config.py . Let’s go ahead and learn more about the configuration script now.

Understanding our configuration file

Before we can actually fine-tune our network, we first need to create our configuration file to store important variables, including:

  • Paths to the input dataset
  • Class labels
  • Batch size/training parameters
  • Output paths, including model files, label encoders, plot histories, etc.

Since there are so many parameters that we need, I’ve opted to use a configuration file to keep our code nice and organized (versus having to utilize many command line arguments).

Our configuration file, config.py, lives in a Python module named pyimagesearch .

We keep the config.py  file there for two reasons:

  1. To ensure we can import the configuration into our own Python scripts
  2. To keep our code tidy and organized

Note: This config file is similar to the one in last week’s and the prior week’s tutorials.

Let’s fill our config.py  file now — open it up in your favorite code editor and insert the following lines:

First, we import os , enabling us to build file/directory paths directly in this config.

The original dataset path where we extracted the Food-11 dataset is contained in ORIG_INPUT_DATASET .

Then we specify the BASE_PATH  where our organized dataset will soon reside.

From there we’ll define the names of our TRAIN , TEST , and VAL  directories:

Followed by listing the eleven CLASSES  of our Food-11 dataset:

Finally, we’ll specify our batch size and model + plot paths:

Our BATCH_SIZE  of 32  represents the size of the chunks of data that will flow through our CNN.

We’ll store our fine-tuned serialized Keras model in the  MODEL_PATH .

Similarly, we specify the paths where our warmup and unfrozen plot images will be stored.

Building our image dataset for fine-tuning

If we were to store the entire Food-11 dataset in memory, it would occupy ~10GB of RAM.

Most deep learning rigs should be able to handle that amount of data, but nevertheless, I’ll be showing you how to use the .flow_from_directory  function with Keras to only load small batches of data from disk at a time.

However, before we can actually get to fine-tuning and re-training a network, we first must (correctly) organize our dataset of images on disk.

In order to use the .flow_from_directory  function, Keras requires that we have our dataset organized using the following template:

dataset_name/class_label/example_of_class_label.jpg

And since the Food-11 dataset also provides pre-supplied data splits, our final directory structure will have the form:

dataset_name/split_name/class_label/example_of_class_label.jpg

Having the above directory structure ensures that:

  1. The .flow_from_directory  function will properly work.
  2. Our dataset is organized into a neat, easy to follow directory structure.

In order to take the original Food-11 images and then copy them into our desired directory structure, we need the build_dataset.py  script.

Let’s review that script now:

Lines 2-5 import our necessary packages, in particular, our config .

From there we loop over data splits beginning on Line 8. Inside, we:

  • Extract imagePaths  and each class label  (Lines 11-18).
  • Create a directory structure for our organized image files (Lines 21-25).
  • Copy the image files into the appropriate destination (Lines 28 and 29).

This script has been reviewed in more detail inside the Transfer learning with Keras and deep learning post. If you would like more detail on the inner-workings of build_dataset.py , please refer to the previous tutorial.


Before continuing, make sure you have used the “Downloads” section of the tutorial to download the source code associated with this blog post.

From there, open up a terminal and execute the following command:

If you investigate the dataset/  directory you’ll see three directories, one for each of our respective data splits:

Inside each of the data split directories you’ll also find class label subdirectories:

And inside each of the class label subdirectories you’ll find images associated with that label:

Implementing fine-tuning with Keras

Now that our images are in the proper directory structure, we can perform fine-tuning with Keras.

Let’s implement the fine-tuning script inside train.py :

Lines 2-20 import required packages. Let’s briefly review those that are most important to the fine-tuning concepts in today’s post:

  • matplotlib : We’ll be plotting our frozen and unfrozen training efforts. Line 3 sets the backend ensuring that we can save our plots to disk as image files.
  • ImageDataGenerator : Allows for data augmentation. Be sure to refer to DL4CV and this blog post for more information on this class.
  • VGG16 : The seminal network trained on ImageNet that we’ll be slicing and dicing with our scalpel for the purposes of fine-tuning.
  • classification_report : Calculates basic statistical information upon evaluation of our model.
  • config : Our custom configuration file which we reviewed in the “Understanding our configuration file” section.

Be sure to familiarize yourself with the rest of the imports as well.

With the packages at our fingertips, we’re now ready to move on. Let’s start by defining a function for plotting training history:

The plot_training  function is defined on Lines 22-34. This helper function will be used to construct and save a plot of our training history.

Let’s determine the total number of images in each of our splits:

Lines 38-40 define paths to training, validation, and testing directories, respectively.

Then, we determine the total number of images for each split via Lines 44-46 — these values will enable us to calculate the steps per epoch.

Let’s initialize our data augmentation object and establish our mean subtraction value:

The process of data augmentation is important for small datasets. In fact, it is nearly always recommended. Lines 49-56 define our training data augmentation object. The parameters specify random rotations, zooms, translations, shears, and flips to the training data as we train.

Note: A common misconception I see about data augmentation is that the random transforms of the images are then added to the original training data — that’s not the case. The random transformations performed by data augmentation are performed in-place, implying that the dataset size does not increase. These transforms are performed in-place, on the fly, during training.

Although our validation data augmentation object (Line 60) uses the same class, we do not supply any parameters (we don’t apply data augmentation to validation or testing data). The validation ImageDataGenerator  will only be used for mean subtraction which is why no parameters are needed.

Next, we set the ImageNet mean subtraction values on Line 65. In this pre-processing technique, we perform a pixel-wise subtraction for all images. Mean subtraction is one of several scaling techniques I explain in the Practitioner Bundle of Deep Learning for Computer Vision with Python. In the text, we’ll even build a custom preprocessor to more efficiently accomplish mean subtraction.

Given the pixel-wise subtraction values, we prepare each of our data augmentation objects for mean subtraction (Lines 66 and 67).

Our data augmentation generators will generate data directly from their respective directories:

Lines 70-94 define generators that will load batches of images from their respective, training, validation, and testing splits.

Using these generators ensures that our machine will not run out of RAM by trying to load all of the data at once.

Let’s go ahead and perform network surgery:

First, we’ll load the VGG16 architecture (with pre-trained ImageNet weights) from disk, leaving off the fully connected layers (Lines 98 and 99). By omitting the fully connected layers, we have effectively put the network in a guillotine to behead our network as in Figure 2.

From there, we define a new fully connected layer head (Lines 103-107).

Note: If you are unfamiliar with the contents on Lines 103-107, I recommend that you read my Keras tutorial or CNN tutorial. And if you would like to immerse yourself completely into the world of deep learning, be sure to check out my highly rated deep learning book.

On Line 111 we place the new FC layer head on top of the VGG16 base network. You can think of this as adding sutures to sew the head back on to the network body after surgery.

Take the time to review the above code block carefully as it is where the heart of fine-tuning with Keras begins.

Continuing on with fine-tuning, let’s freeze all of the CONV layers in the body of VGG16:

Lines 115-116 freeze all CONV layers in the VGG16 base model.

Given that the base is now frozen, we’ll go ahead and train our network (only the head weights will be updated):

In this block, we train our model , keeping in mind that no weight updates will occur in the base. Only the head of the network will be tuned at this point.

In this code block, we:

  • Compile the model  (Lines 121-123). We use "categorical_crossentropy"  for our loss  function. If you are performing classification with only two classes, be sure to use "binary_crossentropy" .
  • Train our network while applying data augmentation, only updating the weights for the head of the network (Lines 129-134)
  • Reset our testing generator (Line 139).
  • Evaluate our network on our testing data (Lines 140-142). We’ll print classification statistics in our terminal via Lines 143 and 144.
  • Plot the training history via our plot_training  function (Line 145).

Now let’s proceed to unfreeze the final set of CONV layers in the base model layers:

We start by resetting our training and validation generators (Lines 148 and 149).

We then unfreeze the final CONV layer block in VGG16 (Lines 153 and 154). Again, only the final CONV block of VGG16 is unfrozen (not the rest of the network).

Just so there is no confusion about what is going on in our network, Lines 158 and 159 will show us which layers are frozen/not frozen (i.e., trainable). The information will print out in our terminal.

Continuing on, let’s fine-tune both the final set of CONV layers and our set of FC layers:

Since we’ve unfrozen additional layers, we must re-compile the model (Lines 164-166).

We then train the model again, this time fine-tuning both the FC layer head and the final CONV block (Lines 170-175).

Wrapping up, let’s evaluate the network once more:

Here we:

  • Make predictions on the testing data (Lines 180-183).
  • Print a new classification report (Lines 184 and 185).
  • Save the unfrozen training plot to disk (Line 186).
  • And serialize the model to disk, allowing us to recall the model in our predict.py  script (Line 190).

Great job sticking with me on our fine-tuning journey. We’re going to put our script to work next!

Training a network via fine-tuning with Keras

Now that we’ve implemented our Python script to perform fine-tuning, let’s give it a try and see what happens.

Make sure you’ve used the “Downloads” section of this tutorial to download the source code to this post, and from there, execute the following command:

Figure 6: Our Keras fine-tuning network is allowed to “warm up” prior to unfreezing only the final block of CONV layers in VGG16.

After fine-tuning just our newly initialized FC layer head and allowing the FC Layers to warm up, we are obtaining ~78% accuracy which is quite respectable.

Next, we see that we have unfrozen the final block of CONV layers in VGG16 while leaving the rest of the network weights frozen:

Once we’ve unfrozen the final CONV block, we resume fine-tuning:

Figure 7: We have unfrozen the final CONV block and resumed fine-tuning with Keras and deep learning. Training and validation loss are starting to divide indicating the start of overfitting, so fine-tuning stops at epoch 20.

I decided to not train past epoch 20 for fear of overfitting. If you take a look at Figure 7 you can start to see our training and validation loss start to rapidly divide. When you see training loss falling quickly while validation loss stagnates or even increases, you know you are overfitting.

That said, at the end of our fine-tuning process, we are now obtaining 87% accuracy, a significant increase from just fine-tuning the FC layer heads alone!

Making predictions with fine-tuning and Keras

Now that we’ve fine-tuned our Keras model, let’s see how we can use it to make predictions on images outside the training/testing set (i.e., our own custom images).

Open up predict.py  and insert the following code:

Lines 2-7 import our required packages. We’re going to use load_model  to recall our Keras fine-tuned model from disk and make predictions. This is also the first time today that we will use OpenCV ( cv2 ).

On Lines 10-13 we parse our command line argument. The --image  argument allows us to supply any image from our terminal at runtime with no modifications to the code. It makes sense to take advantage of a command line argument rather than hard-coding the value here or in our config.

Let’s go ahead and load that image from disk and preprocess it:

Lines 16-30 load and preprocess our image . The preprocessing steps are identical to training and include:

  • Making a copy  of the image and resizing it for output  purposes (Lines 17 and 18).
  • Swapping color channels since we trained with RGB images and OpenCV loaded this image  in BGR order (Line 23).
  • Resizing the image  to 224×224 pixels for inference (Line 24).
  • Converting the image  to floating point (Line 28).
  • Performing mean subtraction (Lines 29 and 30).

Note: When we perform inference using a custom prediction script, if the results are unsatisfactory nine times out of ten it is due to improper preprocessing. Typically having color channels in the wrong order or forgetting to perform mean subtraction altogether will lead to unfavorable results. Keep this in mind when writing your own scripts.

Now that our image is ready, let’s predict its class label:

We load our fine-tuned model  via Line 34 and then perform inference. The top prediction class label  is extracted on Lines 37-39.

Finally, we annotate the output  image and display it on screen (Lines 42-48). The text  annotation contains the highest prediction along with its associated confidence.

On to the fun part — testing our script on food! I’m hungry just thinking about it and I bet you may be too.

Keras fine-tuning results

To see our fine-tuned Keras model in action, make sure you use the “Downloads” section of this tutorial to download the source code and example images.

From there, open up a terminal and execute the following command:

Figure 8: Our fine-tuned Keras deep learning network correctly recognizes oysters as “seafood”.

As you can see from Figure 7, we have correctly classified the input image as “Seafood”.

Let’s try another example:

Figure 9: With 64% accuracy this image of chicken wings is classified as “fried food”. We have applied the process fine-tuning to a pre-trained model to recognize new classes with Keras and deep learning.

Our fine-tuned network has labeled the image as “Fried food” despite it being in the “Meat” class in our dataset.

Chicken wings are typically fried and these ones clearly are. They are both “Meat” and “Fried food” which is why we are pulled in two directions. Therefore, I’m still declaring it as a “correct” classification. A fun experiment would be to apply fine-tuning with multi-label classification. I’ll leave that as an exercise to you to implement.

Below I have included a few additional results from my fine-tuning experiments:

Figure 10: Fine-tuning with Keras and deep learning on the Food-11 dataset.

What’s next — where do I learn more about transfer learning, feature extraction, and fine-tuning?

Over the past few weeks since we started this series on transfer learning with Keras, I’ve received a number of emails and comments that are some variation of the following:

  • “How can I determine the number of nodes to put in my fully connected layer head when fine-tuning?”
  • “What optimizer and learning rate should I use for fine-tuning?”
  • “Which CONV layers (and when) should I freeze and unfreeze?”
  • “How do I classify images outside my training/testing set?”
  • “How do I load an image from disk, extract features from it using a CNN, and then classify it using the neural network?”
  • “How do I correctly preprocess my input image before classification?”

Today’s tutorial is long enough as it is, so I can’t include those sections of Deep Learning for Computer Vision with Python inside this post.

If you’d like to learn more about transfer learning, including:

  1. More details on the concept of transfer learning
  2. How to perform feature extraction
  3. How to fine-tune networks
  4. How to classify images outside your training/testing set using both feature extraction and fine-tuning

…then you’ll definitely want to refer to Deep Learning for Computer Vision with Python.

Besides chapters on transfer learning, you’ll also find:

  • Super practical walkthroughs that present solutions to actual, real-world image classification, object detection, and instance segmentation problems.
  • Hands-on tutorials (with lots of code) that not only show you the algorithms behind deep learning for computer vision but their implementations as well.
  • A no-nonsense teaching style that is guaranteed to help you master deep learning for image understanding and visual recognition.

To learn more about the book, and grab the table of contents + free sample chapters, just click here!

Summary

In this tutorial, you learned how to perform fine-tuning with Keras and deep learning.

To perform fine-tuning, we:

  1. Loaded the VGG16 network architecture from disk with weights pre-trained on ImageNet.
  2. Ensured the original fully connected layer heads were removed (i.e., where the output predictions from the network are made).
  3. Replaced the originally fully connected layers with brand new, freshly initialized ones.
  4. Froze all CONV layers in VGG16.
  5. Trained only the fully connected layer heads.
  6. Unfroze the final set of CONV layer blocks in VGG16.
  7. Continued training.

Overall, we were able to obtain 87% accuracy on the Food-11 dataset.

Further accuracy can be obtained by applying additional data augmentation and adjusting the parameters to our optimizer and number of FC layer nodes.

If you’re interested in learning more about fine-tuning with Keras, including my tips, suggestions, and best practices, be sure to take a look at Deep Learning for Computer Vision with Python where I cover fine-tuning in more detail.

I hope you enjoyed today’s tutorial on fine-tuning!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

32 Responses to Fine-tuning with Keras and Deep Learning

  1. David Bonn June 3, 2019 at 1:40 pm #

    Great post, Adrian,

    I have read and re-read your sections on transfer learning in the DL4CV Practitioner Bundle. This blog post is a great introduction to a powerful technique.

    It would be interesting to see an example of fine-tuning that used a network architecture other than VGGNET. In particular, there is at least some evidence (and if you take Stack Overflow questions as “evidence” a lot more than “some evidence”) that Batch Normalization Layers can cause subtle problems with fine tuning. I’d also love to see examples of fine tuning on networks that don’t use a fully connected head, such as SqueezeNet.

    Another thing is that the PyTorch/fastai world has a different approach on fine tuning. They will unfreeze the whole network but have it learn at a very very low rate. If you were training the new head at a learn rate of R, they would train the top third of the network at 0.1R and the rest of the network at 0.01R. Back in the day the Caffe people used a similar approach. It seems to work for them and at least somewhat mitigates this Batch Normalization problem.

    • Adrian Rosebrock June 6, 2019 at 7:05 am #

      Great comment, thanks David.

      Rest assured — this is not the final post you’ll see on fine-tuning and transfer learning. We’ll be diving into some of those concepts in future posts 🙂

  2. Farshad June 3, 2019 at 2:54 pm #

    Very very great and amazing tutorial in fine tuning published on net ever. thanks a lot.

    • Adrian Rosebrock June 6, 2019 at 7:01 am #

      Thanks Farshad, I’m glad you enjoyed it!

  3. Hassan AbuHelweh June 3, 2019 at 5:34 pm #

    Great Article Dr. Adrian, Thank you very much indeed for your effort.

    Its explained in details in your deep learning practitioner book chapter 2-5 I have tried on animals and flowers dataset, it boosts the accuracy to new levels.

    I’m still not familiar with the surgery of placing the head FC model on top of the base model -this will become the actual model we will train. then freezing and unfreezing the layers is not easy tasks.

    I hope i can practice more samples and things should be more clear to me.

    Best Regards

    • Adrian Rosebrock June 6, 2019 at 6:59 am #

      I would suggest doing:

      You’ll see each of the layers in each of the respective networks and how they are combined into the final model.

  4. Troy Zuroske June 4, 2019 at 2:07 pm #

    Does this method allow the newly trained model to predict the original classifications (1000 original classes) as well as the new ones (food categories) or just the new food categories? I believe the other method of transfer learning in your last two tutorials was only able to classify the new categories.

    • Adrian Rosebrock June 6, 2019 at 6:50 am #

      No, just the food categories. The entire point of transfer learning (both feature extraction and fine-tuning) is to allow your model to predict classes it was never trained on, but utilizing the weights from a pre-trained model (such as the ImageNet dataset).

  5. Lode June 4, 2019 at 4:23 pm #

    for info: I was able to download the file when using ipv4
    wget –prefer-family=ipv4 –ftp-user FoodImage@grebvm2.epfl.ch –ftp-password Cahc1moo ftp://tremplin.epfl.ch/Food-11.zip

    • Adrian Rosebrock June 6, 2019 at 6:47 am #

      Thanks for sharing, Lode!

  6. Griffey June 7, 2019 at 12:01 am #

    Dear Adrian ,

    Thanks for your great post. I read the three posts. In the third post , you teach the “flow_from_directory” function . I think the function could also be applied in the feature_extract in the second post by just removing the “ImageDataGenerator” and doing some small modification . Am I right ?

  7. Paul Zikopoulos June 15, 2019 at 9:16 am #

    Great post. Question on this …

    “Note: A common misconception I see about data augmentation is that the random transforms of the images are then added to the original training data — that’s not the case. The random transformations performed by data augmentation are performed in-place, implying that the dataset size does not increase. These transforms are performed in-place, on the fly, during training.”

    Is that ALWAYS the case. I have been working on some projects and we had a tool that generated more images for us doing all of this stuff, but it seemed to be they were persisted. Would that make it so it could train faster.

    How does above work … just keeps the image in memory I guess and applies it … or makes copies of it … loads it up, so you have virtual 10 copies or something like that.

    I could see a benefit for storing those images .. but I’m a newbie .. is this a best practice to not persist the stuff

    • Adrian Rosebrock June 19, 2019 at 2:13 pm #

      Great question, Paul. I’ve actually decided to write a dedicated blog post on image augmentation as it seems like that question is a common misconception (perhaps more misunderstood than I originally thought). That tutorial will be publishing in a couple of weeks.

  8. Tamoghna June 19, 2019 at 10:54 am #

    Hey Adrian,

    As always, a great post. I just want to clarify certain fundamental doubts:

    1. How is algorithms such as SSD, YOLO or Mask RCNN different from pre-trained NN architectures such as VGG16/VGG19, Inception? What is the fundamental difference?

    2. In the learning curve, we saw that the train_loss and val_loss curves crosses with each other at epoch 3-4, hence we should be using EarlyStopping to prevent overfitting. My question is, what is the minimum value of patience (epoch count) should we use?

    I don’t know but I feel the learning curve looks not so like a perfectly fit model, yet surprised to see that you got 87% mAP. I am hoping to use InceptionV3 since it has relatively less number of parameters as compared to VGG16. Will this prevent the overfitting at an early stage than what we witnessed in your example?

    Any good alternative suggestion(s) is appreciated.

    • Adrian Rosebrock June 19, 2019 at 1:36 pm #

      1. SSDs, YOLO, and Mask R-CNN utilize a “backbone” network such as VGG, Inception, or ResNet. The backbone can be trained from scratch in conjunction with the SSD, etc. but is more typically pre-trained on ImageNet.

      2. I don’t typically use the EarlyStopping class. I prefer to let my model train and then examine the validation/loss curves as its training. This gives me MUCH more control over the training process.

      I go into more detail regarding both of these points inside Deep Learning for Computer Vision with Python so definitely consider going through the text.

  9. AMM June 23, 2019 at 3:01 pm #

    Hi Mr., Thank you for the helpful post. I would like to ask you some questions if you allow me..

    – Can I use the source code of this post with my face dataset or the pretrained model specialized for food? If I can, what is the pretrained model for face dataset, can I use exactly the same code with only dataset change?

    – I have a limited training data dataset and as I know transfer learning by finetuning used to address limited training data that I trying to do, but the food-11 dataset is rich, so what is the benefit of using finetuning for training it when its data is dequate for training.

    • Adrian Rosebrock June 26, 2019 at 1:31 pm #

      1. For face recognition we use an entirely different NN architecture. You could fine-tune a FaceNet, for example.

      2. Sorry, I don’t understand your question. Perhaps you can elaborate.

  10. Corentin June 24, 2019 at 2:17 am #

    Hi Adrian,
    Great post! You are saying multiple times “Fine tuning is used to learn new object classes which the network having trained on”.
    Let me know if you disagree but for me it is not always the case. It would be only for the first time you used already trained network on ImageNet for example. But what about the next time? If you add more training data on your food/not food set, you would use again fine tuning but not from the ImageNet trained network but rather the network from your previous fine tuning on food/ not food data set.

    • Adrian Rosebrock June 26, 2019 at 1:24 pm #

      You could do that but you would want to ensure you are also including the previous classes/images, otherwise you are performing online/incremental learning.

      • Corentin July 24, 2019 at 1:57 am #

        Yes I am always reusing the previous classes/images plus the new ones. So in that case this also fine tuning and it’s not learning new objects classes.

  11. Manudeep Reddy June 24, 2019 at 5:05 pm #

    I fine-tuned VGG-16 to classify dog breeds using the Stanford Dog dataset.
    I removed the FC layers and added a Dense(4096) and Dense(120, softmax). First, for a few epochs, I froze the layers of base vgg-16 to warm up FC layers.
    After that, I unfroze the last block of Conv layers and trained the model.But still,I am not able to achieve a decent acc/val_acc. Is there anything that I can fix? or should I consider fine-tuning other models like resnet etc?

  12. Shraddha July 1, 2019 at 2:48 am #

    Hi Adrian,
    I have a small doubt, it might sound silly but I wanted to know the difference between .h5 extension (used to store keras model) and .model extension that you are using, is there any?
    Is the model file in hdf5 format? I am trying to convert the model into tensorflow .pb file, I’ll be grateful if you could guide me on this. And the Keras doc does not recommend the use of pickle or cpickle but you are using it, why so?

    • Adrian Rosebrock July 4, 2019 at 10:37 am #

      There isn’t any, it’s just a model serialized in HDF5 format.

  13. Marcel July 7, 2019 at 1:44 am #

    Hi, great blog!
    I would like to ask you about that “mean” extraction part. Why we need to extract it?

    • Adrian Rosebrock July 10, 2019 at 9:55 am #

      I think you meant to say “mean subtraction”. The values 122.68, 116.778, and 103.939 are the average RGB pixel intensties of the ImageNet dataset (what the CNN used here was originally trained on).

  14. oanh July 24, 2019 at 5:46 am #

    Hi!
    What version of python is used in this link?

    • Adrian Rosebrock July 25, 2019 at 9:27 am #

      I used Python 3 but it’s also compatible with Python 2.7.

  15. Mohamed TOUATI July 25, 2019 at 7:03 am #

    Hi Pyimage ! how can make my own costum images training with Faster RCNN ? i do classification of make model car , i did the labelling and annotation , jpg and xml files

  16. Ll July 29, 2019 at 3:11 am #

    Hi, is it possible to use the pre-training model for medical image classification?

Leave a Reply

[email]
[email]