Deep Learning with OpenCV

Two weeks ago OpenCV 3.3 was officially released, bringing with it a highly improved deep learning ( dnn ) module. This module now supports a number of deep learning frameworks, including Caffe, TensorFlow, and Torch/PyTorch.

Furthermore, this API for using pre-trained deep learning models is compatible with both the C++ API and the Python bindings, making it dead simple to:

  1. Load a model from disk.
  2. Pre-process an input image.
  3. Pass the image through the network and obtain the output classifications.

While we cannot train deep learning models using OpenCV (nor should we), this does allow us to take our models trained using dedicated deep learning libraries/tools and then efficiently use them directly inside our OpenCV scripts.

In the remainder of this blog post I’ll demonstrate the fundamentals of how to take a pre-trained deep learning network on the ImageNet dataset and apply it to input images.

To learn more about deep learning with OpenCV, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Deep Learning with OpenCV

In the first part of this post, we’ll discuss the OpenCV 3.3 release and the overhauled dnn  module.

We’ll then write a Python script that will use OpenCV and GoogleLeNet (pre-trained on ImageNet) to classify images.

Finally, we’ll explore the results of our classifications.

Deep Learning inside OpenCV 3.3

The dnn module of OpenCV has been part of the opencv_contrib  repository since version v3.1. Now in OpenCV 3.3 it is included in the main repository.

Why should you care?

Deep Learning is a fast growing domain of Machine Learning and if you’re working in the field of computer vision/image processing already (or getting up to speed), it’s a crucial area to explore.

With OpenCV 3.3, we can utilize pre-trained networks with popular deep learning frameworks. The fact that they are pre-trained implies that we don’t need to spend many hours training the network — rather we can complete a forward pass and utilize the output to make a decision within our application.

OpenCV does not (and does not intend to be) to be a tool for training networks — there are already great frameworks available for that purpose. Since a network (such as a CNN) can be used as a classifier, it makes logical sense that OpenCV has a Deep Learning module that we can leverage easily within the OpenCV ecosystem.

Popular network architectures compatible with OpenCV 3.3 include:

  • GoogleLeNet (used in this blog post)
  • AlexNet
  • SqueezeNet
  • VGGNet (and associated flavors)
  • ResNet

The release notes for this module are available on the OpenCV repository page.

Aleksandr Rybnikov, the main contributor for this module, has ambitious plans for this module so be sure to stay on the lookout and read his release notes (in Russian, so make sure you have Google Translation enabled in your browser if Russian is not your native language).

It’s my opinion that the dnn  module will have a big impact on the OpenCV community, so let’s get the word out.

Configure your machine with OpenCV 3.3

Installing OpenCV 3.3 is on par with installing other versions. The same install tutorials can be utilized — just make sure you download and use the correct release.

Simply follow these instructions for MacOS or Ubuntu while making sure to use the opencv and opencv_contrib releases for OpenCV 3.3. If you opt for the MacOS + homebrew install instructions, be sure to use the --HEAD  switch (among the others mentioned) to get the bleeding edge version of OpenCV.

If you’re using virtual environments (highly recommended), you can easily install OpenCV 3.3 alongside a previous version. Just create a brand new virtual environment (and name it appropriately) as you follow the tutorial corresponding to your system.

OpenCV deep learning functions and frameworks

OpenCV 3.3 supports the Caffe, TensorFlow, and Torch/PyTorch frameworks.

Keras is currently not supported (since Keras is actually a wrapper around backends such as TensorFlow and Theano), although I imagine it’s only a matter of time until Keras is directly supported given the popularity of the deep learning library.

Using OpenCV 3.3 we can load images from disk using the following functions inside dnn :

  • cv2.dnn.blobFromImage
  • cv2.dnn.blobFromImages

We can directly import models from various frameworks via the “create” methods:

  • cv2.dnn.createCaffeImporter
  • cv2.dnn.createTensorFlowImporter
  • cv2.dnn.createTorchImporter

Although I think it’s easier to simply use the “read” methods and load a serialized model from disk directly:

  • cv2.dnn.readNetFromCaffe
  • cv2.dnn.readNetFromTensorFlow
  • cv2.dnn.readNetFromTorch
  • cv2.dnn.readhTorchBlob

Once we have loaded a model from disk, the .forward method is used to forward-propagate our image and obtain the actual classification.

To learn how all these OpenCV deep learning pieces fit together, let’s move on to the next section.

Classifying images using deep learning and OpenCV

In this section, we’ll be creating a Python script that can be used to classify input images using OpenCV and GoogLeNet (pre-trained on ImageNet) using the Caffe framework.

The GoogLeNet architecture (now known as “Inception” after the novel micro-architecture) was introduced by Szegedy et al. in their 2014 paper, Going deeper with convolutions.

Other architectures are also supported with OpenCV 3.3 including AlexNet, ResNet, and SqueezeNet — we’ll be examining these architectures for deep learning with OpenCV in a future blog post.

In the meantime, let’s learn how we can load a pre-trained Caffe model and use it to classify an image using OpenCV.

To begin, open up a new file, name it deep_learning_with_opencv.py , and insert the following code:

On Lines 2-5 we import our necessary packages.

Then we parse command line arguments:

On Line 8 we create an argument parser followed by establishing four required command line arguments (Lines 9-16):

  • --image : The path to the input image.
  • --prototxt : The path to the Caffe “deploy” prototxt file.
  • --model : The pre-trained Caffe model (i.e,. the network weights themselves).
  • --labels : The path to ImageNet labels (i.e., “syn-sets”).

Now that we’ve established our arguments, we parse them and store them in a variable, args , for easy access later.

Let’s load the input image and class labels:

On Line 20, we load the image  from disk via cv2.imread .

Let’s take a closer look at the class label data which we load on Lines 23 and 24:

As you can see, we have a unique identifier followed by a space, some class labels, and a new-line. Parsing this file line-by-line is straightforward and efficient using Python.

First, we load the class label rows  from disk into a list. To do this we strip whitespace from the beginning and end of each line while using the new-line (‘ \n ‘) as the row delimiter (Line 23). The result is a list of IDs and labels:

Second, we use list comprehension to extract the relevant class labels from rows  by looking for the space (‘ ‘) after the ID, followed by delimiting class labels with a comma (‘ , ‘). The result is simply a list of class labels:

Now that we’ve taken care of the labels, let’s dig into the dnn  module of OpenCV 3.3:

Taking note of the comment in the block above, we use cv2.dnn.blobFromImage  to perform mean subtraction to normalize the input image which results in a known blob shape (Line 31).

We then load our model from disk:

Since we’ve opted to use Caffe, we utilize cv2.dnn.readNetFromCaffe  to load our Caffe model definition prototxt  and pre-trained  model  from disk (Line 35).

If you are familiar with Caffe, you’ll recognize the prototxt  file as a plain text configuration which follows a JSON-like structure — I recommend that you open bvlc_googlenet.prototxt  from the “Downloads” section in a text editor to inspect it.

Note: If you are unfamiliar with configuring Caffe CNNs, then this is a great time to consider the PyImageSearch Gurus course — inside the course you’ll get an in depth look at using deep nets for computer vision and image classification.

Now let’s complete a forward pass through the network with blob  as the input:

It is important to note at this step that we aren’t training a CNN — rather, we are making use of a pre-trained network. Therefore we are just passing the blob through the network (i.e., forward propagation) to obtain the result (no back-propagation).

First, we specify blob  as our input (Line 39). Second, we make a start  timestamp (Line 40), followed by passing our input image through the network and storing the predictions. Finally, we set an end  timestamp (Line 42) so we can calculate the difference and print the elapsed time (Line 43).

Let’s finish up by determining the top five predictions for our input image:

Using NumPy, we can easily sort and extract the top five predictions on Line 47.

Next, we will display the top five class predictions:

The idea for this loop is to (1) draw the top prediction label on the image itself and (2) print the associated class label probabilities to the terminal.

Lastly, we display the image to the screen (Line 64) and wait for the user to press a key before exiting (Line 65).

Deep learning and OpenCV classification results

Now that we have implemented our Python script to utilize deep learning with OpenCV, let’s go ahead and apply it to a few example images.

Make sure you use the “Downloads” section of this blog post to download the source code + pre-trained GoogLeNet architecture + example images.

From there, open up a terminal and execute the following command:

Figure 1: Using OpenCV and deep learning to predict the class label for an input image.

In the above example, we have Jemma, the family beagle. Using OpenCV and GoogLeNet we have correctly classified this image as “beagle”.

Furthermore, inspecting the top-5 results we can see that the other top predictions are also relevant, all of them of which are dogs that have similar physical appearances as beagles.

Taking a look at the timing we also see that the forward pass took < 1 second, even though we are using our CPU.

Keep in mind that the forward pass is substantially faster than the backward pass as we do not need to compute the gradient and backpropagate through the network.

Let’s classify another image using OpenCV and deep learning:

Figure 2: OpenCV and deep learning is used to correctly label this image as “traffic light”.

OpenCV and GoogLeNet correctly label this image as “traffic light” with 100% certainty.

In this example we have a “bald eagle”:

Figure 3: The “deep neural network” (dnn) module inside OpenCV 3.3 can be used to classify images using pre-trained models.

We are once again able to correctly classify the input image.

Our final example is a “vending machine”:

Figure 4: Since our GoogLeNet model is pre-trained on ImageNet, we can classify each of the 1,000 labels inside the dataset using OpenCV + deep learning.

OpenCV + deep learning once again correctly classifes the image.

Summary

In today’s blog post we learned how to use OpenCV for deep learning.

With the release of OpenCV 3.3 the deep neural network ( dnn ) library has been substantially overhauled, allowing us to load pre-trained networks via the Caffe, TensorFlow, and Torch/PyTorch frameworks and then use them to classify input images.

I imagine Keras support will also be coming soon, given how popular the framework is. This will likely take be a non-trivial implementation as Keras itself can support multiple numeric computation backends.

Over the next few weeks we’ll:

  1. Take a deeper dive into the dnn  module and how it can be used inside our Python + OpenCV scripts.
  2. Learn how to modify Caffe .prototxt  files to be compatible with OpenCV.
  3. Discover how we can apply deep learning using OpenCV to the Raspberry Pi.

This is a can’t-miss series of blog posts, so be before you go, make sure you enter your email address in the form below to be notified when these posts go live!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

43 Responses to Deep Learning with OpenCV

  1. Hermann-Marcus Behrens August 21, 2017 at 10:51 am #

    Very cool work! Thanks for your blogposts.

  2. Bayo August 21, 2017 at 11:27 am #

    hello, does the code work on raspberry pi?

    • Adrian Rosebrock August 21, 2017 at 3:37 pm #

      This method will work on the Raspberry Pi, but you’ll need a network small enough to run on the Pi. I’ll covering this in detial in a future blog post.

    • Mas August 24, 2017 at 11:48 am #

      Strongly yes

  3. Ansh August 21, 2017 at 12:38 pm #

    This is great, cant wait to try it! It was about time that OpenCV introduced Deep Learning. I was wondering of the following though –

    It would be great to see if we can use DNN for tracking objects, like the “tracking a ball” example you had blogged. Most of the Neural Nets examples I have seen involved classification or labeling the objects. Are neural network efficient in tracking objects as well? Or does dlib’s object correlation better at it. Which CV method is good (efficient) for what….? it would be great if you can blog about the CV landscape as there are so many methods efficient for different things

    I am motivated for robotics application of CV. Also I am assuming that your consequent blogs will have methods to train a model as well?

    Thanks.

    • Adrian Rosebrock August 21, 2017 at 3:36 pm #

      It really depends on exactly what types of objects you are trying to track and under which conditions. Deep learning can be used to track objects, but typically we use correlation filters for this (like in dlib). I’ll consider doing a survey of object tracking methods in the future, thanks for the suggestion!

    • Aleksandr Rybnikov August 21, 2017 at 4:53 pm #

      Object tracking is already in OpenCV dnn. Lightweight yet accurate SSD with MobileNet backbone is in the samples directory https://github.com/opencv/opencv/blob/master/samples/dnn/mobilenet_ssd_python.py

      • Adrian Rosebrock August 22, 2017 at 10:49 am #

        Thanks for sharing (and for your contributions!) Aleksandr. What you’re referring to is actually object detection, the process of determining the (x, y)-coordinates of a given object in an image. Object tracking normally takes place after a location has been identified (which is what I assume Ansh is referring to). “Object detection” and “object tracking” are two different operations.

        Thanks again for the comment I’ll make sure object detection with OpenCV + deep learning is covered in a future blog post as well.

  4. Steven Barnes August 21, 2017 at 12:51 pm #

    It might be useful to mention where to get the python opencv library for python3 for each platform as it is not obvious. You also mention following the install instructions but do not have a link to them, and again they are not that easy to find on the OpenCV site.

    • Adrian Rosebrock August 21, 2017 at 3:35 pm #

      Hi Steven — I actually link to this page which includes OpenCV + Python install instructions for a variety of different platforms and operating systems.

  5. Diogo Aleixo August 21, 2017 at 12:56 pm #

    Hi Adrian

    Is there a way to train another category on imageNet? The one that i want is not available.

  6. Maham Khan August 21, 2017 at 3:03 pm #

    Wow! This is the best thing ever. Deep learning will be so easy with OpenCV. And also thank you Adrian for making the tutorial so quickly, and keep us updated with the latest release. You are doing great contribution for Computer Vision community!
    Much appreciated tutorials. Just by going through your post, one can get the whole idea of the process.

    • Adrian Rosebrock August 21, 2017 at 3:34 pm #

      Thanks Maham! I’m glad you enjoyed the post. There will be plenty more on deep learning + OpenCV 🙂

      • Supra August 21, 2017 at 9:20 pm #

        It doesn’t work with raspberry pi 3 on latest version Raspbian Stretch.
        I’m using OpenCV 3.3.0. And the problem is “No module named cv2”

        • Adrian Rosebrock August 22, 2017 at 10:46 am #

          You need to install OpenCV first. It doesn’t matter if you’re using Raspbian Wheezy, Jessie, or Stretch — OpenCV must first be installed.

  7. Aleksandr Rybnikov August 21, 2017 at 4:59 pm #

    BTW, there is an error in the article. Correct name of the developer of the dnn is Aleksandr Rybnikov, actually it’s me

    • Adrian Rosebrock August 22, 2017 at 10:48 am #

      Thank you for bringing this to my attention. I have updated the blog post 🙂 Thank you again for your wonderful contributions to the OpenCV library. I look forward to help spread the word more regarding your work!

  8. Saumya Rajen Shah August 22, 2017 at 3:29 am #

    Where can we find the imageNet labels?

    • Adrian Rosebrock August 22, 2017 at 10:44 am #

      Please use the “Downloads” section of this blog post. There you will find a .txt file containing the ImageNet labels.

  9. Vincent Thon August 22, 2017 at 5:34 am #

    I Adrian, love your work! Your blog is my main go to place when it comes to computer vision. I have some models trained with tflearn. Do you think I’d be able to utilize those with the cv2.dnn.createTensorFlowImporter?

    • Adrian Rosebrock August 22, 2017 at 10:43 am #

      Hi Vincent — I haven’t tried importing a model trained via TFLearn. I would suggest giving it a try.

  10. Mansoor Nasir August 22, 2017 at 3:11 pm #

    Adrian, this is amazing work, i really appreciate all the efforts you make this step by step tutorial. My only question is, how will we use this with a model trained by TensorFlow?

    Thank you for all your help.

    • Adrian Rosebrock August 22, 2017 at 5:17 pm #

      You would replace cv2.dnn.readNetFromCaffe with cv2.dnn.readNetFromTensorFlow.

  11. knaffe August 23, 2017 at 11:45 pm #

    Thank you for your blogs. I have read all of them.
    How could I load my model trained by myself with tensorflow and use it ?
    By the way, Do you know some effective deep or traditional methods for motion detection running on raspberry PI3 with real-time performance?
    Thank you for your great job again and look forward to your new blogs!!

    • Adrian Rosebrock August 24, 2017 at 3:32 pm #

      1. Please see my reply to “Mansoor” above regarding TensorFlow.

      2. Take a look at this blog post for simple motion detection on the Raspberry Pi.

  12. oguzhan August 24, 2017 at 7:12 am #

    So cool, THX!! We are waiting raspberry pi tutorial 🙂

  13. Megha Shanbhag August 28, 2017 at 4:16 am #

    Hi, I have installed and built openCV 3.3 in my laptop. I have not built Opencv_contrib. When I run the example given in the deep-learning-opencv.zip, i get error stating

    ” File “deep_learning_with_opencv.py”, line 34, in
    blob = cv2.dnn.blobFromImage(image, 1, (224, 224), (104, 117, 123))
    AttributeError: ‘module’ object has no attribute ‘blobFromImage'”

    Can you please tell me what could be the issue?

    • Adrian Rosebrock August 28, 2017 at 4:21 pm #

      Can you confirm that you are running OpenCV 3.3?

      The output should be 3.3.0.

      • Boikobo September 5, 2017 at 4:41 am #

        I have a similar issue. It is showing that its opencv 3.3.0 but saying

        blob = cv2.dnn.blobFromImage(image, 1, (224, 224), (104, 117, 123))
        AttributeError: ‘module’ object has no attribute ‘blobFromImage’”

        • Adrian Rosebrock September 5, 2017 at 9:10 am #

          Hi Boikobo — that is indeed very strange. For whatever reason it appears your version of OpenCV was not compiled with “dnn”. I would go back to installing OpenCV and ensure that “dnn” is listed in the “modules to be built” output of CMake.

  14. Smartos August 28, 2017 at 6:08 am #

    great post!

  15. Tham August 29, 2017 at 12:42 am #

    Do you know how to save the model of PyTorch?
    I train and save a simple cnn model by PyTorch, but it cannot loaded by the dnn module(I am using 3.3).

    Complete question can view at StackOverflow(https://stackoverflow.com/questions/45929573/how-should-i-save-the-model-of-pytorch-if-i-want-it-loadable-by-opencv-dnn-modul)

    • Adrian Rosebrock August 31, 2017 at 8:45 am #

      I have not used PyTorch so unfortunately I do not know the answer to this question. I hope another PyImageSearch reader can help!

  16. Imaduddin A Majid August 29, 2017 at 10:43 am #

    Really great article. Thank you for sharing this with us. I also expected this will work with Keras soon.

  17. Lg September 6, 2017 at 6:05 am #

    Thanks for this post. Really cool stuff.

    I’ve tried with other models like squeezenet, alexnet, bvlc_reference_caffenet with success, the accuracy is good as well.

    Some errors, like with a white cat jumping in a meadow recognized as an artic fox.

    Are there caffe models trained to recognize people ?

    • Adrian Rosebrock September 7, 2017 at 7:06 am #

      Yes, I will actually be covering one for object detection that can detect people in next week’s blog post. Stay tuned 🙂

      • Lg September 9, 2017 at 4:41 am #

        Hi Adrian,

        Looking for models on the Internet, I found several articles about “OXFORD VGG Face dataset”.

        References :
        https://github.com/mzaradzki/neuralnets/blob/master/vgg_faces_keras/
        http://www.vlfeat.org/matconvnet/pretrained/#face-recognition

        Then I installed keras_vggface.

        I finally found the caffe model and prototxt. This works very well with your code: “deep-learning-with-opencv”.

        [INFO] classification took 0.66553 seconds
        [INFO] 1. label: Adelaide_Kane, probability: 0.99818
        [INFO] 2. label: Lucy_Hale, probability: 0.00031506
        [INFO] 3. label: Jamie_Gray_Hyder, probability: 0.0001969
        [INFO] 4. label: Odeya_Rush, probability: 0.00010968
        [INFO] 5. label: Sasha_Barrese, probability: 8.4347e-05

        Now, the question is how to train this model with our own pictures or add more people to the dataset.

        I am looking forward to reading your article.

  18. Komal September 7, 2017 at 4:16 am #

    Hey Adrian,
    in Opencv 3.2 I’m getting an error while using blobfFromImage function of dnn. That it’s not there. What are the differences in Opencv 3.2 and OpenCV 3.3 ?

    • Adrian Rosebrock September 7, 2017 at 6:54 am #

      Hi Komal — the “dnn” sub-module was totally re-engineered in OpenCV 3.3. You need to upgrade to OpenCV 3.3.

Trackbacks/Pingbacks

  1. Object detection with deep learning and OpenCV - PyImageSearch - September 11, 2017

    […] couple weeks ago we learned how to classify images using deep learning and OpenCV 3.3’s deep neural network ( dnn ) […]

Leave a Reply