Online/Incremental Learning with Keras and Creme

In this tutorial, you will learn how to perform online/incremental learning with Keras and Creme on datasets too large to fit into memory.

A few weeks ago I showed you how to use Keras for feature extraction and online learning — we used that tutorial to perform transfer learning and recognize classes the original CNN was never trained on.

To accomplish that task we needed to use Keras to train a very simple feedforward neural network on the features extracted from the images.

However, what if we didn’t want to train a neural network?

What if we instead wanted to train a Logistic Regression, Naive Bayes, or Decision Tree model on top of the data? Or what if we wanted to perform feature selection or feature processing before training such a model?

You may be tempted to use scikit-learn — but you’ll soon realize that scikit-learn does not treat incremental learning as a “first-class citizen” — only a few online learning implementations are included in scikit-learn and they are awkward to use, to say the least.

Instead, you should use Creme, which:

  • Implements a number of popular algorithms for classification, regression, feature selection, and feature preprocessing.
  • Has an API similar to scikit-learn.
  • And makes it super easy to perform online/incremental learning.

In the remainder of this tutorial I will show you how to:

  1. Use Keras + pre-trained CNNs to extract robust, discriminative features from an image dataset.
  2. Utilize Creme to perform incremental learning on a dataset too large to fit into RAM.

Let’s get started!

To learn how to perform online/incremental learning with Keras and Creme, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Online/Incremental Learning with Keras and Creme

In the first part of this tutorial, we’ll discuss situations where we may want to perform online learning or incremental learning.

We’ll then discuss why the Creme machine learning library is the appropriate choice for incremental learning.

We’ll be using Kaggle’s Dogs vs. Cats dataset for this tutorial — we’ll spend some time briefly reviewing the dataset.

From there, we’ll take a look at our directory structure from the project.

Once we understand the directory structure, we’ll implement a Python script that will be used to extract features from the Dogs vs. Cats dataset using Keras and a CNN pre-trained on ImageNet.

Given our extracted features (which will be too big to fit into RAM), we’ll use Creme to train a Logistic Regression model in an incremental learning fashion, ensuring that:

  1. We can still train our classifier, despite the extracted features being too large to fit into memory.
  2. We can still obtain high accuracy, even though we didn’t have access to “all” features at once.

Why Online Learning/Incremental Learning?

Figure 1: Multi-class incremental learning with Creme allows for machine learning on datasets which are too large to fit into memory (image source).

Whether you’re working with image data, text data, audio data, or numerical/categorical data, you’ll eventually run into a dataset that is too large to fit into memory.

What then?

  • Do you head to Amazon, NewEgg, etc. and purchase an upgraded motherboard with maxed out RAM?
  • Do you spin up a high memory instance on a cloud service provider like AWS or Azure?

You could look into one of those options — and in some cases, they are totally reasonable avenues to explore.

But my first choice would be to apply online/incremental learning.

Using incremental learning you can work with datasets too large to fit into RAM and apply popular machine learning techniques, including:

  • Feature preprocessing
  • Feature selection
  • Classification
  • Regression
  • Clustering
  • Ensemble methods
  • …and more!

Incremental learning can be super powerful — and today you’ll learn how to apply it to your own data.

Why Creme for Incremental Learning?

Figure 2: Creme is a library specifically tailored to incremental learning. The API is similar to that of scikit-learn’s which will make you feel at home while putting it to work on large datasets where incremental learning is required.

Neural networks and deep learning are a form of incremental learning — we can train such networks on one sample or one batch at a time.

However, just because we can apply neural networks to a problem doesn’t mean we should.

Instead, we need to bring the right tool to the job. Just because you have a hammer in your hand doesn’t mean you would use it to bang in a screw.

Incremental learning algorithms encompass a set of techniques used to train models in an incremental fashion.

We often utilize incremental learning when a dataset is too large to fit into memory.

The scikit-learn library does include a small handful of online learning algorithms, however:

  1. It does not treat incremental learning as a first-class citizen.
  2. The implementations are awkward to use.

Enter the Creme library — a library exclusively dedicated to incremental learning with Python.

The library itself is fairly new but last week I had some time to hack around with it.

I really enjoyed the experience and found the scikit-learn inspired API very easy to use.

After going through the rest of this tutorial, I think you’ll agree with me when I say, Creme is a great little library and I wish the developers and maintainers all the best with it — I hope that the library continues to grow.

The Dogs vs. Cats Dataset

Figure 3: In today’s example, we’re using Kaggle’s Dogs vs. Cats dataset. We’ll extract features with Keras producing a rather large features CSV. From there, we’ll apply incremental learning with Creme.

The dataset we’ll be using here today is Kaggle’s Dogs vs. Cats dataset.

The dataset includes 25,000 examples, evenly distributed:

  • Dogs: 12,500 images
  • Cats: 12,500 images

Our goal is to apply transfer learning to:

  1. Extract features from the dataset using Keras and a pre-trained CNN.
  2. Use online/incremental learning via Creme to train a classifier on top of the features in an incremental fashion.

Setting up your Creme environment

While Creme requires a simple pip install, we have some other packages to install for today’s example too. Today’s required packages include:

  1. imutils and OpenCV (a dependency of imutils)
  2. scikit-learn
  3. TensorFlow
  4. Keras
  5. Creme

First, head over to my pip install opencv tutorial to install OpenCV in a Python virtual environment. The OpenCV installation instructions suggest an environment named cv but you can name yours whatever you’d like.

From there, install the rest of the packages in your environment:

Let’s ensure everything is properly installed by launching a Python interpreter:

Provided that there are no errors, your environment is ready for incremental learning.

Project Structure

Figure 4: Download from the Kaggle Dogs vs. Cats downloads page for this incremental learning with Creme project.

To set up your project, please follow the following steps:

  1. Use the “Downloads” section of this blog post and follow the instructions to download the code.
  2. Download the code to somewhere on your system. For example, you could download it to your ~/Desktop  or ~/Downloads  folder.
  3. Open a terminal, cd  into the same folder where the zip resizes. Unzip/extract the files via unzip . Keep your terminal open.
  4. Log into Kaggle (required for downloading data).
  5. Head to the Kaggle Dogs vs. Cats “Data” page.
  6. Click the little download button next to only the file. Save it into ~/Desktop/keras-creme-incremental-learning/  (or wherever you extracted the blog post files).
  7. Back in your terminal, extract the dataset via  unzip .

Now let’s review our project structure:

You should see a train/  directory with 25,000 files. This is where your actual dog and cat images reside. Let’s list a handful of them:

As you can see, the class label (either “cat” or “dog”) is included in the first few characters of the filename. We’ll parse the class name out later.

Back to our project tree, under the train/  directory are  and features.csv . These files are not included with the “Downloads”. You should have already downloaded and extracted  from Kaggle’s website. We will learn how to extract features and generate the large 12GB+ features.csv  file in the next section.

The two Python scripts we’ll be reviewing are  and . Let’s begin by extracting features with Keras!

Extracting Features with Keras

Before we can perform incremental learning, we first need to perform transfer learning and extract features from our Dogs vs. Cats dataset.

To accomplish this task, we’ll be using the Keras deep learning library and the ResNet50 network (pre-trained on ImageNet). Using ResNet50, we’ll allow our images to forward propagate to a pre-specified layer.

We’ll then take the output activations of that layer and treat them as a feature vector. Once we have feature vectors for all images in our dataset, we’ll then apply incremental learning.

Let’s go ahead and get started.

Open up the  file and insert the following code:

On Lines 2-12, all the packages necessary for extracting features are imported. Most notably this includes ResNet50 . ResNet50 is the convolutional neural network (CNN) we are using for transfer learning (Line 3).

Three command line arguments are then parsed via Lines 15-22:

  • --dataset : The path to our input dataset (i.e. Dogs vs. Cats).
  • --csv : File path to our output CSV file.
  • --batch-size : By default, we’ll use a batch size of 32 . This will accommodate most CPUs and GPUs.

Let’s go ahead and load our model:

On Line 27, we load the model  while specifying two parameters:

  • weights="imagenet" : Pre-trained ImageNet weights are loaded for transfer learning.
  • include_top=False : We do not include the fully-connected head with the softmax classifier. In other words, we chop off the head of the network.

With weights loaded, and by loading our model without the head, we are now ready for transfer learning. We will use the output values of the network directly, storing the results as feature vectors.

Our feature vectors will each be 100,352-dim (i.e. 7 x 7 x 2048 which are the dimensions of the output volume of ResNet50 without the FC layer header).

From here, let’s grab our imagePaths  and extract our labels:

On Lines 32-34, we proceed to grab all imagePaths  and randomly shuffle them.

From there, our class labels  are extracted from the paths themselves (Line 38). Each image path as the format:

  • train/cat.0.jpg
  • train/dog.0.jpg
  • etc.

In a Python interpreter, we can test Line 38 for sanity. As you develop the parsing + list comprehension, your interpreter might look like this:

Lines 39 and 40 then instantiate and fit our label encoder, ensuring we can convert the string class labels to integers.

Let’s define our CSV columns and write them to the file:

We’ll be writing our extracted features to a CSV file.

The Creme library requires that the CSV file has a header and includes a name for each of the columns, namely:

  1. The name of the column for the class label
  2. A name for each of the features

Line 43 creates column names for each of the 7 x 7 x 2048 = 100,352 features while Line 44 defines the class name column (which will store the class label).

Thus, the first five rows and ten columns our CSV file will look like this:

Notice how the class  is the first column. Then the columns span from feat_0  all the way to feat_100351  for a total of 100,352 features. If you edit the command to print more than 10 columns — say 5,000 — then you’ll see that not all the values are 0.

Moving on, let’s proceed to loop over the images in batches:

We’ll loop over imagePaths  in batches of size bs  (Line 52).

Lines 58 and 59 then grab the batch of paths and labels, while Line 60 initializes a list to hold the batch of images.

Let’s loop over the current batch:

Looping over paths in the batch (Line 63), we will load each image , preprocess it, and gather it into  batchImages . The image  itself is loaded on Line 66.

We’ll preprocess the image by:

  • Resizing to 224×224 pixels via the target_size  parameter on Line 66.
  • Converting to array format (Line 67).
  • Adding a batch dimension (Line 72).
  • Performing mean subtraction (Line 73).

Note: If these preprocessing steps appear foreign, please refer to Deep Learning for Computer Vision with Python where I cover them in detail.

Finally, the image  is added to the batch via Line 76.

In order to extract features, we’ll now pass the batch of images through our network:

Our batch of images is sent through the network via Lines 81 and 82. 

Keep in mind that we have removed the fully-connected head layer of the network. Instead, the forward propagation stops prior to the average pooling layer. We will treat the output of this layer as a list of features , also known as a “feature vector”.

The output dimension of the volume is (batch_size, 7 x 7 x ,2048). We can thus  reshape  the features  into a NumPy array of shape (batch_size, 7 * 7 * 2048), treating the output of the CNN as a feature vector.

Maintaining our batch efficiency, the features  and associated class labels are written to our CSV file (Lines 86-90).

Inside the CSV file, the class label  is the first field in each row (enabling us to easily extract it from the row during training). The feature vec  follows.

The features CSV file is closed via Line 93, as the last step of our script.

Applying feature extraction with Keras

Now that we’ve coded up , let’s apply it to our dataset.

Make sure you have:

  1. Used the “Downloads” section of this tutorial to download the source code.
  2. Downloaded the Dogs vs. Cats dataset from Kaggle’s website.

Open up a terminal and execute the following command:

Using an NVIDIA K80 GPU the entire feature extraction process took 20m45s.

You could also use your CPU but keep in mind that the feature extraction process will take much longer.

After your script finishes running, take a look at the output size of features.csv :

The resulting file is over 12GB!

And if we were to load that file into RAM, assuming 32-bit floats for the feature vectors, we would need 10.03GB!

Your machine may or may not have that much RAM…but that’s not the point. Eventually, you will encounter a dataset that is too large for you to work with in main memory. When that time comes, you need need to use incremental learning.

Incremental Learning with Creme

If you’re at this point in the tutorial then I will assume you have extracted features from the Dogs vs. Cats dataset using Keras and ResNet50 (pre-trained on ImageNet).

But what now?

We’ve made the assumption that the entire dataset of extracted feature vectors are too large to fit into memory — how can we train a machine learning classifier on that data?

Open up the file and let’s find out:

Lines 2-8 import packages required for incremental learning with Creme. We’ll be taking advantage of Creme’s implementation of LogisticRegression . Creme’s stream  module includes a super convenient CSV data generator. Throughout training, we’ll calculate and print out our current Accuracy  with Creme’s built in metrics  tool.

Let’s now use argparse to parse our command line arguments:

Our two command line arguments include:

  • --csv : The path to our input CSV features file.
  • --cols : Dimensions of our feature vector (i.e. how many columns there are in our feature vector).

Now that we’ve parsed our command line arguments, we need to specify the data types of our CSV file to use Creme’s stream  module properly:

Line 21 builds a list of data types (floats) for every feature column of our CSV. We will have 100,352 floats.

Similarly, Line 22 specifies that our class  column is an integer type.

Next, let’s initialize our data generator and construct our pipeline:

Line 25 creates a CSV iterator that will stream  features + class labels to our model.

Lines 28-32 then constructs the model pipeline which:

  • First performs standard scaling (scales data to have zero mean and unit variance).
  • Then trains our Logistic Regression model in an incremental fashion (one data point at a time).

Logistic Regression is a binary classifier meaning that it can be used to predict only two classes (which is exactly what the Dogs vs. Cats dataset is).

However, if you want to recognize > 2 classes, you need to wrap LogisticRegression in a OneVsRestClassifier which fits one binary classifier per class.

Note: There’s no harm in wrapping LogisticRegression in a OneVsRestClassifier for binary classification so I chose to do so here, just so you can see how it’s done — just keep in mind that it’s not required for binary classification but is required for > 2 classes.

Let’s put Creme to work to train our model:

Line 36 initializes our metric  (i.e., accuracy).

From there, we begin to loop over our dataset (Line 39). Inside the loop, we:

  • Make a prediction on the current data point (Line 42). There are 25,000 data points (images), so this loop will run that many times.
  • Update the model  weights based on the prediction (Line 43).
  • Update and display our accuracy metric  (Lines 44 and 45).

Finally, the accuracy of the model is displayed in the terminal (Line 48).

Incremental Learning Results

We are now ready to apply incremental learning using Keras and Creme. Make sure you have:

  1. Used the “Downloads” section of this tutorial to download the source code.
  2. Downloaded the Dogs vs. Cats dataset from Kaggle’s website.

From there, open up a terminal and execute the following command:

After only 21 samples our Logistic Regression model is obtaining 76.19% accuracy.

Letting the model train on all 25,000 samples, we reach 97.412% accuracy which is quite respectable. The process took 6hr48m on my system.

Again, the key point here is that our Logistic Regression model was trained in an incremental fashion — we were not required to store the entire dataset in memory at once. Instead, we could train our Logistic Regression classifier one sample at a time.


In this tutorial, you learned how to perform online/incremental learning with Keras and the Creme machine learning library.

Using Keras and ResNet50 pre-trained on ImageNet, we applied transfer learning to extract features from the Dogs vs. Cats dataset.

We have a total of 25,000 images in the Dogs vs. Cats dataset. The output volume of ResNet50 is 7 x 7 x 2048 = 100,352-dim. Assuming 32-bit floats for our 100,352-dim feature vectors, that implies that trying to store the entire dataset in memory at once would require 10.03GB of RAM.

Not all machine learning practitioners will have that much RAM on their machines.

And more to the point — even if you do have sufficient RAM for this dataset, you will eventually encounter a dataset that exceeds the physical memory on your machine.

When that occasion arises you should apply online/incremental learning.

Using the Creme library we trained a multi-class Logistic Regression classifier one sample at a time, enabling us to obtain 97.412% accuracy on the Dogs vs. Cats dataset.

I hope you enjoyed today’s tutorial!

Feel free to use the code in this blog post as a starting point for your own projects where online/incremental learning is required.

To download the source code to this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

23 Responses to Online/Incremental Learning with Keras and Creme

  1. JulioCP June 17, 2019 at 11:00 am #

    Very interesting! Another way to perform an incremental training could be by means of dask (using to_delay method and an iterator python). I use this way and works very well.

    Thank you for your tutorials!! They are amazing!

    • Adrian Rosebrock June 17, 2019 at 2:02 pm #

      Thanks so much Julio!

  2. Lorenzo June 17, 2019 at 12:12 pm #

    Hi, I think there is a misconception, or bad phrasing, with this article. Online/incremental learning is not about datasets too large to fit in memory. That is a common scenario where you just read a little data at a time, with a generator for example. Training always happens in batches, it does not really matter if those batches are read from memory or from disk.
    Incremental/online learning is about deploying a model that keeps learning after the main training phase is over. You could see transfer learning as a special case of incremental learning but there is a more specific term for that.
    Adding a new class, for example a new identity in a face identification model, running a small training without loosing too much of the current learning could be an example.

    At least this is what I could verify on google.

    • Adrian Rosebrock June 17, 2019 at 2:03 pm #

      Hey Lorenzo — online and incremental learning covers a large variety of techniques. In the practical sense many times it involves working with datasets too large to fit into memory. But in an even broader sense it may also encompass methods that involve adding classes, removing classes, or providing new examples for a particular set of classes.

    • Jadon June 17, 2019 at 7:15 pm #

      I think Lorenzo is right.., as far as most of the deep learning courses and training that I’ve done.

  3. Michael C. June 17, 2019 at 6:37 pm #

    Why do you use the 100,352-dim output volume of ResNet50 without the FC layer header instead of just the FC layer with 2048 dimensions?

    • Adrian Rosebrock June 19, 2019 at 2:01 pm #

      You could do that and I would encourage you do so as an experiment 🙂

      You’ll often find that the features learned by the FC layers become more “class specific” and less useful for transfer learning applications though. Not always, but usually the case.

  4. Zubair Ahmed June 18, 2019 at 2:15 am #

    Hi Lorenzo

    I was also slightly confused with Adrian’s usage of ‘online learning’ here, I have heard this term used when model is in production and needs to be updated, say a facial recognition model needs to add or remove a user, or a time series model needs to add new data and so on. But having read Adrian’s usage here it makes perfect sense to call it both incremental and online learning

    • Adrian Rosebrock June 19, 2019 at 1:55 pm #

      Online and incremental learning cover a variety of techniques and neither is limited exclusively to working with datasets too large to fit into memory or adding/removing classes.

  5. Denis Rothman June 18, 2019 at 5:55 am #

    Great material and ideas as usual, Adrian!

    • Adrian Rosebrock June 19, 2019 at 1:48 pm #

      Thanks Denis!

  6. Samuel Chung June 18, 2019 at 8:26 pm #

    Hi! This tutorial is very helpful,I am planing to try it out.
    Just some questions before try the code, in this article I didn’t find part after the training progress, how can I test a new cat or dog image after this training? Or is there another article for that with Keras?

    The feature extracting part is very interesting too,I was wondering by this way can it make a image search by finding looking into the csv and find most similar image with nearest feature.This part I’ll explore it myself to find out.

    Thank for this great tutorial.

    • Adrian Rosebrock June 19, 2019 at 1:41 pm #

      To make a prediction on a new image after training you would:

      1. Save the Creme model to disk (after training)
      2. Load the model
      3. Load your test image
      4. Extract features from it via Keras
      5. Scale it
      6. Pass the scaled features through the Creme model to obtain the final prediction

  7. bikram kachari June 19, 2019 at 3:37 am #

    Hi Adrian,

    Your tutorials are amazing. I love to read them.

    I had a question. Say I trained a logistic regression model using creme and then saved the model to disk. Now I have some new data points using which I want to update my model.
    Is it possible to load the model from disk and then update/train the model with the new data points in an incremental/online learning fashion

    • Adrian Rosebrock June 19, 2019 at 1:37 pm #

      Yes, that is actually one of the central goals of incremental learning 🙂 Just load the model and continue training.

  8. Subhom June 19, 2019 at 7:12 am #

    Hey Adrian, thanks a lot for the tutorial! I’ve been looking for something that will enable this form of incremental learning for facial recognition. Training a NN with new faces can’t be done in real-time, and using a classifier like an SVM leads to reduced accuracy as the number of faces increase. Can you point me in the right direction here?

    • Adrian Rosebrock June 19, 2019 at 1:36 pm #

      This could be more of a problem related to your actual dataset than the underlying algorithm itself. How many example faces do you have per person?

      • Subhom June 20, 2019 at 9:26 am #

        I am making do with 3 pictures per face, captured live from a webcam. The issue is that as the number of people (classes) registered increases, the probability of a face belonging to particular class drops; so I need to compensate by reducing the threshold for a positive recognition. This manifests itself as a drop in accuracy (obviously, since I’m reducing the threshold). My goal here is that once a user is “registered”, they should be recognised immediately. Training a NN can’t happen that quickly, so I’m using SVM for the recognition section of the pipeline. Am I doing something horribly wrong here?

        • Adrian Rosebrock June 24, 2019 at 1:48 pm #

          It’s hard to say without looking at your actual dataset, but 3 images per person is very low. You should read this guide where I recommend methods to increase your face recognition accuracy.

  9. Abdou June 21, 2019 at 11:16 am #

    Hi Adrian,I have a raspberry, I followed your tutorials, so I realized face detection(CNN) and recognition (128-d face embeddings then classification using svm), and to improve the performances I used neural compute stick movidius, i want to ask you a question, my problem is that i use the cnn of the face detection with movidius and the recognition is done in the raspberry, can I merge two CNN model in one so I can use both at once in movidius.

    • Adrian Rosebrock June 24, 2019 at 1:47 pm #

      Hey Abdou — this post doesn’t focus on face recognition, the RPi, or merging CNNs. I request that the comments section of the post be kept to questions related to the post. If you would like to learn about Raspberry Pi and face recognition, including detection on a single Movidus, refer to Raspberry Pi for Computer Vision.

  10. Aiho July 3, 2019 at 5:26 am #

    Hi Adrian, thanks for the post! I have a question and would be pleased by any help from you. I am planning to use creme for real-time sound-wave prediction. The thing is complicated by the fact that speed of the fit-predict iteration should be comparable with the audio stream going on from my mike. I suggest sound wave being an endless timeseries going by one digit at a time. I use a moving-window concept (forget the first digit and add the new one to the end of my window). I tried to use stateful lstm with rest_states, but the speed of its action is too slow for the sound. Tried SGDRegressor from sklearn with partial_fit but it didn`t converge and now I want to experiment with creme. May be I do something fundamentally wrong in this task or may be even this task is unbearable due to the speed of the sound. I would appreciate any suggestion from you, it would be great. Thanks!!

    • Adrian Rosebrock July 4, 2019 at 10:17 am #

      This sounds like a neat project Aiho; however, I don’t have any tutorials for sound-wave prediction nor do I do much work in that area. Best of luck with it though!

Leave a Reply