Deep learning in production with Keras, Redis, Flask, and Apache

Shipping deep learning models to production is a non-trivial task.

If you don’t believe me, take a second and look at the “tech giants” such as Amazon, Google, Microsoft, etc. — nearly all of them provide some method to ship your machine learning/deep learning models to production in the cloud.

Going with a model deployment service is perfectly fine and acceptable…but what if you wanted to own the entire process and not rely on external services?

This type of situation is more common than you may think. Consider:

  • An in-house project where you cannot move sensitive data outside your network
  • A project that specifies that the entire infrastructure must reside within the company
  • A government organization that needs a private cloud
  • A startup that is in “stealth mode” and needs to stress test their service/application in-house

How would you go about shipping your deep learning models to production in these situations, and perhaps most importantly, making it scalable at the same time?

Today’s post is the final chapter in our three part series on building a deep learning model server REST API:

  1. Part one (which was posted on the official blog!) is a simple Keras + deep learning REST API which is intended for single threaded use with no concurrent requests. This method is a perfect fit if this is your first time building a deep learning web server or if you’re working on a home/hobby project.
  2. In part two we demonstrated how to leverage Redis along with message queueing/message brokering paradigms to efficiently batch process incoming inference requests (but with a small caveat on server threading that could cause problems).
  3. In the final part of this series, I’ll show you how to resolve these server threading issues, further scale our method, provide benchmarks, and demonstrate how to efficiently scale deep learning in production using Keras, Redis, Flask, and Apache.

As the results of our stress test will demonstrate, our single GPU machine can easily handle 500 concurrent requests (0.05 second delay in between each one) without ever breaking a sweat — this performance continues to scale as well.

To learn how to ship your own deep learning models to production using Keras, Redis, Flask, and Apache, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Deep learning in production with Keras, Redis, Flask, and Apache

The code for this blog post is primarily based on our previous post, but with some minor modifications — the first part of today’s guide will review these changes along with our project structure.

From there we’ll move on to configuring our deep learning web application, including installing and configuring any packages you may need (Redis, Apache, etc.).

Finally, we’ll stress test our server and benchmark our results.

For a quick overview of our deep learning production system (including a demo) be sure to watch the video above!

Our deep learning project structure

Our project structure is as follows:

Let’s review the important files:

  •  contains all our Flask web server code — Apache will load this when starting our deep learning web app.
  •  will:
    • Load our Keras model from disk
    • Continually poll Redis for new images to classify
    • Classify images (batch processing them for efficiency)
    • Write the inference results back to Redis so they can be returned to the client via Flask
  •  contains all Python-based settings for our deep learning productions service, such as Redis host/port information, image classification settings, image queue name, etc.
  •  contains utility functions that both  and  will use (namely base64  encoding).
  • keras_rest_api_app.wsgi  contains our WSGI settings so we can serve the Flask app from our Apache server.
  •  can be used to programmatically consume the results of our deep learning API service.
  • jemma.png  is a photo of my family’s beagle. We’ll be using her as an example image when calling the REST API to validate it is indeed working.
  • Finally, we’ll use  to stress our server and measure image classification throughout.

As described last week, we have a single endpoint on our Flask server, /predict . This method lives in  and will compute the classification for an input image on demand. Image pre-processing is also handled in .

In order to make our server production-ready, I’ve pulled out the classify_process  function from last week’s single script and placed it in . This script is very important as it will load our Keras model and grab images from our image queue in Redis for classification. Results are written back to Redis (the /predict  endpoint and corresponding function in  monitors Redis for results to send back to the client).

But what good is a deep learning REST API server unless we know its capabilities and limitations?

In , we test our server. We’ll accomplish this by kicking off 500 concurrent threads which will send our images to the server for classification in parallel. I recommend running this on the server localhost to start, and then running it from a client that is off site.

Building our deep learning web app

Figure 1: Data flow diagram for a deep learning REST API server built with Python, Keras, Redis, and Flask.

Nearly every single line of code used in this project comes from our previous post on building a scalable deep learning REST APIthe only change is that we are moving some of the code to separate files to facilitate scalability in a production environment.

As a matter of completeness I’ll be including the source code to each file in this blog post (and in the “Downloads” section of this blog post). For a detailed review of the files, please see the previous post.

Settings and configurations

In  you’ll be able to change parameters for the server connectivity, image dimensions + data type, and server queuing.

Helper utilities

The  file contains two functions — one for  base64  encoding and the other for decoding.

Encoding is necessary so that we can serialize + store our image in Redis. Likewise, decoding is necessary so that we can deserialize the image into NumPy array format prior to pre-processing.

The deep learning web server

Here in , you’ll see predict , the function associated with our REST API /predict  endpoint.

The predict  function pushes the encoded image into the Redis queue and then continually loops/polls until it obains the prediction data back from the model server. We then JSON-encode the data and instruct Flask to send the data back to the client.

The deep learning model server

The  file houses our classify_process  function. This function loads our model and then runs predictions on a batch of images. This process is ideally excuted on a GPU, but a CPU can also be used.

In this example, for sake of simplicity, we’ll be using ResNet50 pre-trained on the ImageNet dataset. You can modify classify_process  to utilize your own deep learning models.

The WSGI configuration

Our next file, keras_rest_api_app.wsgi  , is a new component to our deep learning REST API compared to last week.

This WSGI configuration file adds our server directory to the system path and imports the web app to kick off all the action. We point to this file in our Apache server settings file, /etc/apache2/sites-available/000-default.conf , later in this blog post.

The stress test

Our  script will help us to test the server and determine its limitations. I always recommend stress testing your deep learning REST API server so that you know if (and more importantly, when) you need to add additional GPUs, CPUs, or RAM. This script kicks off NUM_REQUESTS  threads and POSTs to the /predict  endpoint. It’s up to our Flask web app from there.

Configuring our deep learning production environment

This section will discuss how to install and configure the necessary prerequisites for our deep learning API server.

We’ll use my PyImageSearch Deep Learning AMI (freely available to you to use) as a base. I chose a p2.xlarge instance with a single GPU for this example.

You can modify the code in this example to leverage multiple GPUs as well by:

  1. Running multiple model server processes
  2. Maintaining an image queue for each GPU and corresponding model process

However, keep in mind that your machine will still be limited by I/O. It may be beneficial to instead utilize multiple machines, each with 1-4 GPUs than trying to scale to 8 or 16 GPUs on a single machine.

Compile and installing Redis

Redis, an efficient in-memory database, will act as our queue/message broker.

Obtaining and installing Redis is very easy:

Create your deep learning Python virtual environment

Let’s create a Python virtual environment for this project. Please see last week’s tutorial for instructions on how to install virtualenv  and virtualenvwrapper if you are new to Python virtual environments.

When you’re ready, create the virtual environment:

From there, let’s install the necessary packages:

Note: We use TensorFlow 1.4.1 since we are using CUDA 8. You should use TensorFlow 1.5 if using CUDA 9.

Install the Apache web server

Other web servers can be used such as nginx but since I have more experience with Apache (and therefore more familiar with Apache in general), I’ll be using Apache for this example.

Apache can be installed via:

If you’ve created a virtual environment using Python 3 you’ll want to install the Python 3 WSGI + Apache module:

Otherwise, Python 2.7 users should install the Pytohn 2.7 WSGI + Apache module:

To validate that Apache is installed, open up a browser and enter the IP address of your web server. If you can’t see the server splash screen then be sure to open up Port 80 and Port 5000.

In my case, the IP address of my server is  (yours will be different). Entering this in a browser I see:

Figure 2: The default Apache splash screen lets us know that Apache is installed and that it can be accessed from an open port 80.

…which is the default Apache homepage.

Sym-link your Flask + deep learning app

By default, Apache serves content from /var/www/html . I would recommend creating a sym-link from /var/www/html  to your Flask web app.

I have uploaded my deep learning + Flask app to my home directory in a directory named keras-complete-rest-api :

I can sym-link it to /var/www/html  via:

Update your Apache configuration to point to the Flask app

In order to configure Apache to point to our Flask app, we need to edit /etc/apache2/sites-available/000-default.conf .

Open in your favorite text editor (here I’ll be using vi ):

At the top of the file supply your WSGIPythonHome  (path to Python bin  directory) and WSGIPythonPath  (path to Python site-packages  directory) configurations:

Since we are using Python virtual environments in this example (I have named my virtual environment keras_flask ), we supply the path to the bin  and site-packages  directory for the Python virtual environment.

Then in body of <VirtualHost> , right after ServerAdmin  and DocumentRoot , add:

Sym-link CUDA libraries (optional, GPU only)

If you’re using your GPU for deep learning and want to leverage CUDA (and why wouldn’t you), Apache unfortunately has no knowledge of CUDA’s *.so  libraries in /usr/local/cuda/lib64 .

I’m not sure what the “most correct” way instruct to Apache of where these CUDA libraries live, but the “total hack” solution is to sym-link all files from /usr/local/cuda/lib64  to /usr/lib :

If there is a better way to make Apache aware of the CUDA libraries, please let me know in the comments.

Restart the Apache web server

Once you’ve edited your Apache configuration file and optionally sym-linked the CUDA deep learning libraries, be sure to restart your Apache server via:

Testing your Apache web server + deep learning endpoint

To test that Apache is properly configured to deliver your Flask + deep learning app, refresh your web browser:

Figure 3: Apache + Flask have been configured to work and I see my welcome message.

You should now see the text “Welcome to the PyImageSearch Keras REST API!” in your browser.

Once you’ve reached this stage your Flask deep learning app should be ready to go.

All that said, if you run into any problems make sure you refer to the next section…

TIP: Monitor your Apache error logs if you run into trouble

I’ve been using Python + web frameworks such as Flask and Django for years and I still make mistakes when getting my environment configured properly.

While I wish there was a bullet proof way to make sure everything works out of the gate, the truth is something is likely going to gum up the works along the way.

The good news is that WSGI logs Python events, including failures, to the server log.

On Ubuntu, the Apache server log is located in /var/log/apache2/ :

When debugging, I often keep a terminal open that runs:

…so I can see the second an error rolls in.

Use the error log to help you get Flask up and running on your server.

Starting your deep learning model server

Your Apache server should already be running. If not, you can start it via:

You’ll then want to start the Redis store:

And in a separate terminal launch the Keras model server:

From there try to submit an example image to your deep learning API service:

If everything is working, you should receive formatted JSON output back from the deep learning API model server with the class predictions + probabilities.

Figure 4: Using cURL to test our Keras REST API server. Pictured is my family beagle, Jemma. She is classified as a beagle with 94.6% confidence by our ResNet model.

Stress testing your deep learning REST API

Of course, this is just an example. Let’s stress test our deep learning REST API.

Open up another terminal and execute the following command:

In your  output you’ll start to see the following lines logged to the terminal:

Even with a new request coming in every 0.05 seconds our batch size never gets larger than ~10-12 images per batch.

Our model server handles the load easily without breaking a sweat and it can easily scale beyond this.

If you do overload the server (perhaps your batch size is too big and you run out of GPU memory with an error message), you should stop the server, and use the Redis CLI to clear the queue:

From there you can adjust settings in  and /etc/apache2/sites-available/000-default.conf . Then you may restart the server.

For a full demo, please see the video below:

Recommendations for deploying your own deep learning models to production

One of the best pieces of advice I can give is to keep your data, in particular your Redis server, close to the GPU.

You may be tempted to spin up a giant Redis server with hundreds of gigabytes of RAM to handle multiple image queues and serve multiple GPU machines.

The problem here will be I/O latency and network overhead.

Assuming 224 x 224 x 3 images represented as float32 array, a batch size of 32 images will be ~19MB of data. This implies that for each batch request from a model server, Redis will need to pull out 19MB of data and send it to the server.

On fast switches this isn’t a big deal, but you should consider running both your model server and Redis on the same server to keep your data close to the GPU.


In today’s blog post we learned how to deploy a deep learning model to production using Keras, Redis, Flask, and Apache.

Most of the tools we used here are interchangeable. You could swap in TensorFlow or PyTorch for Keras. Django could be used instead of Flask. Nginx could be swapped in for Apache.

The only tool I would not recommend swapping out is Redis. Redis is arguably the best solution for in-memory data stores. Unless you have a specific reason to not use Redis, I would suggest utilizing Redis for your queuing operations.

Finally, we stress tested our deep learning REST API.

We submitted a total of 500 requests for image classification to our server with 0.05 second delays in between each — our server was not phased (the batch size for the CNN was never more than ~37% full).

Furthermore, this method is easily scalable to additional servers. If you place these servers behind a load balancer you can easily scale this method further.

I hope you enjoyed today’s blog post!

To be notified when future blog posts are published on PyImageSearch, be sure to enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

4 Responses to Deep learning in production with Keras, Redis, Flask, and Apache

  1. Meng Lee February 7, 2018 at 1:15 am #

    Hi Adrian,

    Thanks for the post sharing a end-to-end workflow of shipping an app utilizing Deep Learning.

    I have built a app recognizing cats using Flask, TensorFlow, CNN in similar way months ago but I decided to built another one by following your post to practice again. Thanks for the material again 😀

    Let me put the link of the app for anyone want further learning:

    • Adrian Rosebrock February 8, 2018 at 8:36 am #

      Great job, thanks for sharing! 🙂

  2. Anastasios Selalmazidis February 8, 2018 at 6:34 am #

    Great article Adrian, as always. I am a flask+ apache guy myself but I found out that flask works better with nginx+gunicorn. You can give it a try if you have the time and maybe benchmark those two to see which fits better fro production environments

    • Adrian Rosebrock February 8, 2018 at 7:45 am #

      Great suggestion, thanks Anastasios. I don’t think I’ll have the time to benchmark with nginx + gunicorn, but if any other readers would like to try and post the results in the comments that would be great!

Leave a Reply