Semantic segmentation with OpenCV and deep learning

In this tutorial, you will learn how to perform semantic segmentation using OpenCV, deep learning, and the ENet architecture. After reading today’s guide, you will be able to apply semantic segmentation to images and video using OpenCV.

Deep learning has helped facilitate unprecedented accuracy in computer vision, including image classification, object detection, and now even segmentation.

Traditional segmentation involves partitioning an image into parts (Normalized Cuts, Graph Cuts, Grab Cuts, superpixels, etc.); however, the algorithm has no actual understanding of what these parts represent.

Semantic segmentation algorithms on the other hand attempt to:

  1. Partition the image into meaningful parts
  2. While at the same time, associate every pixel in an input image with a class label (i.e., person, road, car, bus, etc.)

Semantic segmentation algorithms are super powerful and have many use cases, including self-driving cars — and in today’s post, I’ll be showing you how to apply semantic segmentation to road-scene images/video!

To learn how to apply semantic segmentation using OpenCV and deep learning, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Semantic segmentation with OpenCV and deep learning

In the first part of today’s blog post, we will discuss the ENet deep learning architecture.

From there, I’ll demonstrate how to use ENet to apply semantic segmentation to both images and video streams.

Along the way, I’ll be sharing example outputs from the segmentation so you can get a feel for what to expect when applying semantic segmentation to your own projects.

The ENet semantic segmentation architecture

Figure 1: The ENet deep learning semantic segmentation architecture. This figure is a combination of Table 1 and Figure 2 of Paszke et al.

The semantic segmentation architecture we’re using for this tutorial is ENet, which is based on Paszke et al.’s 2016 publication, ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

One of the primary benefits of ENet is that it’s fast — up to 18x faster and requiring 79x fewer parameters with similar or better accuracy than larger models. The model size itself is only 3.2MB!

A single forward pass on a CPU took 0.2 seconds on my machine — if I were to use a GPU this segmentation network could run even faster. Paszke et al. trained that dataset on The Cityscapes Dataset, a semantic, instance-wise, dense pixel annotation of 20-30 classes (depending on which model you’re using).

As the name suggests, the Cityscapes dataset includes examples of images that can be used for urban scene understanding, including self- driving vehicles.

The particular model we’re using is trained on 20 classes, including:

  • Unlabeled (i.e., background)
  • Road
  • Sidewalk
  • Building
  • Wall
  • Fence
  • Pole
  • TrafficLight
  • TrafficSign
  • Vegetation
  • Terrain
  • Sky
  • Person
  • Rider
  • Car
  • Truck
  • Bus
  • Train
  • Motorcycle
  • Bicycle

In the rest of this blog post, you’ll learn how to apply semantic segmentation to extract a dense, pixel-wise map of each of these classes in both images and video streams.

If you’re interested in training your own ENet models for segmentation on your own custom datasets, be sure to refer to this page where the authors have provided a tutorial on how to do so.

Project structure

Today’s project can be obtained from the “Downloads” section of this blog post. Let’s take a look at our project structure using the tree  command:

Our project has four directories:

  • enet-cityscapes/ : Contains our pre-trained deep learning model, classes list, and color labels to correspond with the classes.
  • images/ : A selection of four sample images to test our image segmentation script.
  • videos/ : Includes two sample videos for testing our deep learning segmentation video script.  Credits for these videos are listed in the “Video segmentation results” section.
  • output/ : For organizational purposes, I like to have my script save the processed videos to the output  folder. I’m not including the output images/videos in the downloads as the file sizes are quite larger. You’ll need to use today’s code to generate them on your own.

Today we’ll be reviewing two Python scripts:

  • : Performs deep learning semantic segmentation on a single image. We’ll walk through this script to learn how segmentation works and then test it on single images before moving on to video.
  • : As the name suggests, this script will perform semantic segmentation on video.

Semantic segmentation in images with OpenCV

Let’s go ahead and get started — open up the  file and insert the following code:

We begin by importing necessary packages.

For this script, I recommend OpenCV 3.4.1 or higher. You can follow one of my installation tutorials — just be sure to specify which version of OpenCV you want to download and install as you follow the steps.

You’ll also need to install my package of OpenCV convenience functions, imutils — just use pip to install the package:

If you are using Python virtual environments don’t forget to use the workon  command before using pip  to install imutils !

Moving on, let’s parse our command line arguments:

This script has five command line arguments, two of which are optional:

  • --model : The path to our deep learning semantic segmentation model.
  • --classes : The path to a text file containing class labels.
  • --image : Our input image file path.
  • --colors : Optional path to a colors text file. If no file is specified, random colors will be assigned to each class.
  • --width : Optional desired image width. By default the value is 500 pixels.

If you aren’t familiar with the concept of argparse  and command line arguments, definitely review this blog post which covers command line arguments in-depth.

Let’s handle our parsing our class labels files and colors next:

We load our CLASSES  into memory from the supplied text file where the path is contained in the command line args  dictionary (Line 23).

If a pre-specified set of COLORS  for each class label is provided in a text file (one per line), we load them into memory (Lines 26-29). Otherwise, we randomly generate COLORS  for each label (Lines 33-40).

For testing purposes (and since we have 20 classes), let’s create a pretty color lookup legend using OpenCV drawing functions:

Here we generate a legend visualization so we can easily visually associate a class label with a color. The legend consists of the class label and a colored rectangle next to it. This is quickly created by creating a canvas (Line 43) and dynamically building the legend with a loop (Lines 46-52). Drawing basics are covered in this blog post.

Here’s the result:

Figure 2: Our deep learning semantic segmentation class color legend generated with OpenCV.

The deep learning segmentation heavy lifting takes place in the next block:

To perform deep learning semantic segmentation of an image with Python and OpenCV, we:

  • Load the model (Line 56).
  • Construct a blob  (Lines 61-64).The ENet model we are using in this blog post was trained on input images with 1024×512 resolution — we’ll use the same here. You can learn more about how OpenCV’s blobFromImage works here.
  • Set the blob  as input to the network (Line 67) and perform a forward pass through the neural network (Line 69).

I surrounded the forward pass statement with timestamps. The elapsed time is printed to the terminal on Line 73.

Our work isn’t done yet — now it’s time to take steps to visualize our results. In the remaining lines of the script, we’ll be generating a color map to overlay on the original image. Each pixel has a corresponding class label index, enabling us to see the results of semantic segmentation on our screen visually.

To begin, we need to extract volume dimension information from our output, followed by calculating the class map and color mask:

We determine the spatial dimensions of the output  volume on Line 77.

Next, let’s find the class label index with the largest probability for each and every (x, y)-coordinate of the output volume (Line 83). This is known now as our classMap and contains a class index for each pixel.

Given the class ID indexes, we can use NumPy array indexing to “magically” (and not to mention, super efficiently) lookup the corresponding visualization color for each pixel (Line 87). Our color mask  will be overlayed transparently on the original image.

Let’s finish the script:

We resize the mask  and classMap  such that they have the exact same dimensions as our input image  (Lines 93-96). It is critical that we apply nearest neighbor interpolation rather than cubic, bicubic, etc. interpolation as we want to maintain the original class IDs/mask values.

Now that sizing is correct, we create a “transparent color overlay” by overlaying the mask on our original image (Line 100). This enables us to easily visualize the output of the segmentation. More information on transparent overlays, and how to construct them, can be found in this post.

Finally, the legend  and original + output  images are shown to the screen on Lines 103-105.

Single-image segmentation results

Be sure to grab the “Downloads” to this blog post before using the commands in this section. I’ve provided the model + associated files, images, and Python scripts in a zip file for your convenience.

The command line arguments that you supply in your terminal are important to replicate my results. Learn about command line arguments here if you are new to them.

When you’re ready, open up a terminal + navigate to the project, and execute the following command:

Figure 3: Semantic segmentation with OpenCV reveals a road, sidewalk, person, bycycle, traffic sign, and more!

Notice how accurate the segmentation is — it clearly segments classes and accurately identifies the person and bicycle (a safety issue for self-driving cars). The road, sidewalk, cars, and even foliage are identified.

Let’s try another example simply by changing the --image  command line argument to be a different image:

Figure 4: Python and OpenCV are used to perform deep learning semantic segmentation of a city neighborhood road scene.

The result in Figure 4 demonstrates the accuracy and clarity of this semantic segmentation model. The cars, road, trees, and sky are clearly marked.

Here’s another example:

Figure 5: In this example of deep learning semantic segmentation with OpenCV, the road is misclassified as sidewalk, but this could be because people are walking in the road.

The above figure is a more complex scene, but ENet can still segment the people walking in front of the car. Unfortunately, the model incorrectly classifies the road as sidewalk, but could be due to the fact that people are walking on it.

A final example:

Figure 6: The ENet semantic segmentation neural network demonstrates how deep learning can effectively be used for self driving car applications. The road, sidewalks, cars, foliage, and other classes are clearly identified by the model and displayed with OpenCV.

The final image that we’ve sent through ENet shows how the model can clearly segment a truck from a car among other scene classes such as road, sidewalk, foliage, person, etc.

Implementing semantic segmentation in video with OpenCV

Let’s continue on and apply semantic segmentation to video. Semantic segmentation in video follows the same concept as on a single image — this time we’ll loop over all frames in a video stream and process each one. I recommend a GPU if you need to process frames in real-time.

Open up the  file and insert the following code:

Here we  import  our required packages and parse command line arguments with argparse. Imports are the same as the previous script. With the exception of the following two command line arguments, the other five are the same as well:

  • --video : The path to the input video file.
  • --show : Whether or not to show the video on the screen while processing. You’ll achieve higher FPS throughput if you set this value to 0 .

The following lines load our classes and associated colors data (or generate random colors). These lines are identical to the previous script:

After loading classes and associating a color with each class for visualization, we’ll load the model and initialize the video stream:

Our model only needs to be loaded once on Line 48 — we’ll use that same model to process each and every frame.

From there we open a video stream pointer to input video file on and initialize our video writer object (Lines 51 and 52).

Lines 55-59 attempt to determine the total  number of frames in the video, otherwise a message is printed indicating that the value could not be determined via Lines 63 and 64. The total  value will be used later to calculate the approximate runtime of this video processing script.

Let’s begin looping over video frames:

Our while  loop begins on Line 68.

We grab a frame  on Line 70 and subsequently check that it is valid on Line 74. If it was not grabbed  properly, we’ve likely reached the end of the video, so we break  out of the frame processing loop (Line 75).

The next set of lines mimic what we accomplished previously with a single image, but this time we are operating on a video frame . Inference occurs here, so don’t overlook these steps where we:

  • Construct a blob  from a resized frame  (Lines 79-81). The ENet model we are using in this blog post was trained on input images with 1024×512 resolution — we’ll use the same here. Learn about how OpenCV’s blobFromImage works here.
  • Set the blob  as input (Line 82) and perform a forward  pass through the neural network (Line 84).

Segmentation inference is now complete, but we want to post process the data in order to visualize + output the results. The remainder of the loop handles this process over three code blocks:

Just as before:

  • We extract the spatial dimensions of the output  volume on Line 89.
  • Generate our classMap  by finding the class label index with the largest probability for each and every pixel of the output  image array (Line 95).
  • Compute our color mask  from the COLORS  associated with each class label index in the classMap  (Line 99).
  • Resize the mask  to match the frame  dimensions (Lines 103 and 104).
  • And finally, overlay the mask on the frame transparently (Line 108).

Let’s write the output frames to disk:

The first time the loop runs, the writer is None , so we need to instantiate it on Lines 111-115. Learn more about writing video to disk with OpenCV.

Using the total  video frame count, we can estimate how long it will take to process the video (Lines 118-122).

Finally, we actually write  the output  to disk on Line 125.

Let’s display the frame  (if needed) and clean up:

In the last block, we check to see if we should display the output frame  and take action accordingly (Lines 128 and 129). While we’re showing the frames in a window on the screen, if “q” is pressed, we’ll “quit” the frame processing loop (Lines 130-134). Finally we cleanup by releasing pointers.

Video segmentation results

To perform semantic segmentation in video, grab the “Downloads” for this blog post.

Then, open up a terminal and execute the following command:

I’ve included a sample of my output below:

Credits: Thank you to Davis King from dlib for putting together a dataset of front/rear views of vehicles. Davis included the videos in his dataset which I then used for this example. Thank you J Utah and Massachusetts Dash Cam for the example videos. Audio credit to BenSound.

What if I want to train my own segmentation networks?

At this point, if you reviewed both scripts, you learned that deep learning semantic segmentation with a pretrained model is quite easy for both images and video. Python and OpenCV make the process straightforward for us, but don’t be fooled by the low line count of the scripts — there are a ton of computations going on under the hood of the segmentation model.

Training a model isn’t as difficult as you’d imagine. If you would like to train your own segmentation networks on your own custom datasets, make sure you refer to the following tutorial provided by the ENet authors.

Please note that I have not trained a network from scratch using ENet but I wanted to provide it in this post as (1) a matter of completeness and (2) just in case you may want to give it a try.

Keep in mind though — labeling image data requires a ton of time and resources. The ENet authors were able to train their model thanks to the hard work of the Cityscapes team who graciously have made their efforts available for learning and research.

Note: The Cityscapes data is for non-commercial use (i.e. academic, research, and learning). Only use the ENet model accordingly.


In today’s blog post we learned how to apply semantic segmentation using OpenCV, deep learning, and the ENet architecture.

Using the pre-trained ENet model on the Cityscapes dataset, we were able to segment both images and video streams into 20 classes in the context of self-driving cars and road scene segmentation, including people (both walking and riding bicycles), vehicles (cars, trucks, buses, motorcycles, etc.), construction (building, walls, fences, etc.), as well as vegetation, terrain, and the ground itself.

If you enjoyed today’s blog post, be sure to share it!

And to download the code to this guide, just enter your email address in the form below — I’ll be sure to notify you when new posts are published here on PyImageSearch as well.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

111 Responses to Semantic segmentation with OpenCV and deep learning

  1. Zana September 3, 2018 at 12:44 pm #

    Hi Adrian

    Thanks for your good post

    Could you please report the processing time on your CPU? Also the model of CPU

    • Adrian Rosebrock September 3, 2018 at 1:53 pm #

      Model processing time on the CPU is already reported on this post. Be sure to refer to the terminal output for each of the respective commands where the throughput time is estimated.

  2. Darío Guzmán September 3, 2018 at 1:45 pm #

    Thanks for the great tutorial Adrian it was really helpful. I have some questions:
    Do you know how much fast this implementation works? And if yes, in which hardware?

    • Adrian Rosebrock September 3, 2018 at 1:53 pm #

      Hey Darío — I have already included speed throughput information in the tutorial. I have included inference approximation on a CPU.

      • PJ September 15, 2018 at 8:56 am #

        I noticed the inference approximation value keeps varying the number of times you run the code for the same image(example).

      • kaisar khatak September 24, 2018 at 12:25 pm #

        I am seeing much slower inference time (>1second) on a Nvidia TX1 (GPU) than the inference approximations in the blog post. What type of hardware (e.g. CPU) is used for the demo results posted? Thanks.

        • Adrian Rosebrock October 8, 2018 at 12:49 pm #

          The demo results were gathered on a 3 GHz Intel Xeon W.

  3. Bogdan September 3, 2018 at 2:07 pm #

    This would be great for background substraction in motion detection for surveillance cameras I guess

  4. Naveenkumar September 4, 2018 at 12:05 am #

    Thanks, Adrian. could you please help me to create own ENet model?

    • Adrian Rosebrock September 5, 2018 at 8:51 am #

      Refer to the “What if I want to train my own segmentation networks?” section of this post.

  5. JBeale September 4, 2018 at 1:29 am #

    I gather the algorithm is starting fresh on each frame, independent of any previous frames. Of course, that’s not the way people do it- once you identify a car or a tree, you expect to see the same objects nearby a moment later, and would not expect an object to magically change into something else. But I suppose combining the segmentation model with object tracking on every moving object would be vastly more complex.

  6. tuxkart September 4, 2018 at 3:55 am #

    Great post, again.
    I just wonder which framework Mr Paszke used to train, can you let me know, thanks so much, Adrian

    • Adrian Rosebrock September 5, 2018 at 8:44 am #

      It was Caffe. Refer to their GitHub (linked to in this post) for more information.

  7. vijay singh September 4, 2018 at 7:11 am #

    great post.
    I am interested to know what are the major area where I can implementations semantic-segmentation .

    • Adrian Rosebrock September 5, 2018 at 8:41 am #

      There are many, both some of the hottest areas for semantic segmentation right now include road scene segmentation for self-driving cars and for work in pathology, such as segmenting cellular structures.

  8. Karthik September 4, 2018 at 7:41 am #

    Hey Adrian, Thanks for this article.

    Is OpenCv’s dnn layer a wrapper around caffe ?

    • Adrian Rosebrock September 5, 2018 at 8:40 am #

      No, OpenCV does not wrap around Caffe, Torch, Lua, etc. Instead, OpenCV provides methods to load these model formats without requiring those respective libraries to be installed.

  9. Ayush Agarwal September 4, 2018 at 11:09 am #

    Hi Adrian,

    Loved your post. I was actually waiting for segmentation post. The thing I need to understand how it works from the scratch. I read a few posts containing the idea of upsampling and skip connections between deconv and maxpool layers. Though, I understood the overview, I need to understand the fine details.

    And also can you explain me the concept/requirement of “blob”

    Thanks and cheers.

    • Adrian Rosebrock September 5, 2018 at 8:36 am #

      The blob concept is covered in detail in this post.

  10. Ayush Agarwal September 4, 2018 at 12:48 pm #

    Hey Adrian,

    Great tutorial. I am not able to get what exactly does the color map signifies.

    • Adrian Rosebrock September 5, 2018 at 8:34 am #

      The color map is just a visualization of the pixel-wise segmentation of the image. Each pixel in the image is associated with a class label. The color map visualizations this relationship.

  11. Larry September 4, 2018 at 2:30 pm #

    Hi Adrian!

    Thanks for the great tutorial.

    I have one question:

    Can i use it for segmentation a car license plates? Just to get something like this:

    • Adrian Rosebrock September 5, 2018 at 8:33 am #

      You would need to train a segmentation model explicitly on car license plates. For what it’s worth I cover how to perform Automatic License Plate Recognition inside the PyImageSearch Gurus course.

  12. Abe September 5, 2018 at 3:10 am #

    Hi Adrian, I’m super stoked for this tutorial, but I just gotta get over this bug I’m running into from the code:

    AttributeError: module ‘cv2.dnn’ has no attribute ‘readNet’

    When I googled around for this situation it is said that I need to build opencv from source from the opencv Master branch. I have installed opencv 3.4.1 using your instructions (Raspbian, Ubuntu, and all installs work great until this snag), and as far as I know, your install instructions do build from source, correct? But I notice that you use “wget” for the opencv zip folder, and not a “git clone: from the opencv repository, could this be the reason? Anyways, I’m about to embark on a re-install of opencv, but wondering if you have some insight on this issue. Thanks Adrian, you’re awesome!!!

    • Adrian Rosebrock September 5, 2018 at 8:30 am #

      Two possible solutions:

      1. Make sure you are using OpenCV 3.4.1 or greater.

      2. Change the function call to: cv2.dnn.readNetFromTorch(args["model"])

      • Walid September 6, 2018 at 2:45 pm #

        Change the function call to: cv2.dnn.readNetFromTorch(args[“model”]) worked for me, but I am curious why id it work?

        • Adrian Rosebrock September 11, 2018 at 8:40 am #

          99% likely due to an OpenCV version difference. Which version of OpenCV are you using?

          • Mekel Mythri October 27, 2018 at 3:54 am #

            Currently i’m using opencv- 3.2.0, Does it works.? Or i need update it to latest version.?

          • Adrian Rosebrock October 29, 2018 at 1:36 pm #

            You need to update to OpenCV 3.4 or OpenCV 4.

  13. ANGENIEUX September 5, 2018 at 3:45 am #

    Dear Adrian,
    Thanks for this huge work!

    Is it possible to reduce the number of classes analyzed by the model (20 -> 5 for example) ? Directly In the py script, only in Enet model ?
    It will be interesting to measure the impact on performances.

    Can semantic segmentation be used for detection/tracking purposes like some of your other examples? Creation of bounding box ?

    Best regards 🙂

  14. PF September 5, 2018 at 7:17 am #

    I had the error:
    AttributeError: module ‘cv2.dnn’ has no attribute ‘readNet’

    Solved it by changing the line:
    net = cv2.dnn.readNet(args[“model”])

    • Adrian Rosebrock September 5, 2018 at 8:27 am #

      Make sure you are using OpenCV 3.4.1 or better as well.

  15. Sundaresh September 5, 2018 at 10:11 am #

    Hi Adrian

    Thank you very much. I did notice that the readNet() method was missing on my version of Open CV (some others have mentioned this on the net – generally the answer is to re-install opencv from the master node) .

    I however was able to apply the model using readNetfromTorch() instead. I seem to get the same result as you have, for example_04.

    I just wonder if you are aware of anything I might lose if I use readNetfromTorch() instead of readNet() ? Of course, I understand I will have to use a different method for models trained in Caffe, TensorFlow etc.


  16. Sundaresh September 5, 2018 at 10:15 am #

    Hi Adrian,

    Regarding my earlier question, I noticed others asked the same this morning ( my page had not refreshed from last night) – sorry for the bother.


    • Adrian Rosebrock September 11, 2018 at 8:56 am #

      No problem at all, Sundar! I’m glad you have resolved the issue 🙂

  17. Stephen OShaughnessy September 6, 2018 at 6:02 am #

    Excellent article Adrian. I am currently researching the application of computer vision in malware classification (converting malware binaries to grayscale and then using image processing/ machine learning etc.). Do you think the methods described in your article have the potential to be applied to identifying malware?

    • Adrian Rosebrock September 11, 2018 at 8:48 am #

      How exactly are you converting the malware to a grayscale image? What process is being performed? Provided you can convert the binary to grayscale and have sufficient data, yes, I do believe you could apply image classification to the problem but I don’t know if it would be more accurate than simply applying an analysis on the binary data itself.

      • Stephen OShaughnessy September 12, 2018 at 1:50 pm #

        Hi Adrian,

        There’s 2 ways that I am converting the binaries:
        1) Byte-to-pixel mapping (by converting the binary to an 8-bit unsigned int numpy array and then saving it as a png). With this method, all the binary features are preserved.
        2) Binary converted to Hillman Curves (haven’t tested this yet)

        In method 1, I have experimented with the following feature descriptors:
        Haralick GLCM

        I’ve tested using the following classifiers:
        Decision Trees
        Random Forest

        I have converted both static binaries (for static analysis) and memory dumps (for dynamic behavioral analysis).

        Best results
        Static: LBP using KNN ~92% accuracy
        Dynamic: HOG using KNN ~94% accuracy

        More common methods of feature extraction for malware classification are either n-gram analysis or disassembling the binary and extracting API calls. Feature vectors can be calculated using either frequency of the features or sequences. These methods are generally noisy and are not robust against obfuscation techniques like encryption or compression. I was interested in also testing out deep learning (I have your book!) and when I saw your post, it really grabbed my attention! I guess my ultimate aim is to develop a system that can classify malware using both static and dynamic features.

        • Adrian Rosebrock September 12, 2018 at 2:01 pm #

          I’m more familiar with the n-gram analysis technique, I hadn’t thought of converting the 8-bit unsigned integers to an actual image. But wouldn’t the malware representation be a single row of 8-bit integers? How are you converting that into a 2D image? Also, thank you so much for picking up a copy of my book 🙂

  18. Sayan Deb Sarkar September 6, 2018 at 11:53 am #

    Hey Adrian,

    Thanks for the great tutorial !

    However, when I run with example_02.png, it gives an error :

    ‘NoneType object has no attribute shape

    Could you please help me resolve the error ?

    • Adrian Rosebrock September 11, 2018 at 8:41 am #

      Your path to the input image is not correct and “cv2.imread” is returning None. Refer to this tutorial to help you solve the problem.

  19. DexC September 6, 2018 at 3:15 pm #

    Thanks for the very helpful tutorial, Adrian! Just want to ask if you’ve tested in OpenCV the pretrained Caffe models on Ade20k?
    I’ve downloaded the caffemodel and prototxt files and am now starting to follow your Object Detection tutorial that uses the MobileNetV2 model, but I’m unsure if the same code would work on OpenCV 3.4.2 with this semantic segmentation model?
    Would I just need to change the blob values? I’d greatly appreciate your help (or anybody here with the time and experience) regarding this. Thanks in advance!

    • Adrian Rosebrock September 11, 2018 at 8:39 am #

      Sorry, what is the Abe20k? As long as you’re running OpenCV 3.4.2 the code for both tutorials should work.

      • prisilla September 13, 2018 at 1:12 am #

        Hi Adrain,

        I have opencv 3.4.1 do i need to upgrade it, should i install PyImageSearch to run the code.

        • Adrian Rosebrock September 14, 2018 at 9:42 am #

          I would recommend OpenCV 3.4.2 or OpenCV 4 for this code.

        • prisilla September 15, 2018 at 6:56 am #

          Hi Adrain,

          I got the output from command prompt. I supplied the arguments from anaconda on windows. Finally, I learned something from your tutorials.

          Thank so much!

          • Adrian Rosebrock September 17, 2018 at 2:50 pm #

            Awesome, glad to hear it! 🙂

      • Dexter January 29, 2019 at 4:51 pm #

        Hi Dr. Adrian. ADE20K is a dataset for semantic segmentation. There are available Pytorch, Caffe and Torch7 implementations in Github.
        But I find your approach to be more aligned to what I’m currently working on.
        Here’s a link to the pretrained caffemodel:
        I’ve managed to get the detection results (class IDs), but I’m stuck at filtering these based on confidence results, and to also get the startY/X and endY/X mask values of the ROI of each detected object (in relation to the input image).
        Hope you can have a tutorial blog post about this some time soon. Thanks!

  20. terry September 7, 2018 at 9:58 am #

    Hi Adrian,
    Thank you for your tutorial.
    Do you think this can be achievable on RPI3B+ and movidius stick to process picamera stream in realtime?

    Thank you

    • Adrian Rosebrock September 11, 2018 at 8:28 am #

      In full real-time as in 20+ FPS? No, that’s unrealistic. As this tutorial shows you may be able to get up to 4-6 FPS but anything higher I believe is unrealistic.

  21. Thierry September 7, 2018 at 10:44 am #

    Thank you for the amazing tuto, once again. You detail always every steps, it is just perfect!
    i have a question: how do you pilot opencv2 to select cpu or gpu usage? how can i tell it to switch from one to the other? i suppose it is not like tensorflow with the cuda_visible_devices, right?

    another comment, i got also the error with missing dnn.readNet whereas i use opencv-python
    BUT im on windows. maybe this version is not exactly the same as linux version?
    working with readNetFromTorch() works perfectly then.

    I see there is also a readNetFromTensorFlow, so we can now import TF models too? that’s very good!

    • Adrian Rosebrock September 11, 2018 at 8:27 am #

      So here’s the problem:

      OpenCV is starting to include GPU support, including OpenCL support. CUDA + Python support is not yet released but there are PRs in their GitHub repo that are working on CUDA support. I’ll be doing a blog post dedicated to CUDA + Python support once it’s fully supported.

  22. Altaf September 8, 2018 at 3:08 pm #

    Hi , great article. Please write some thing on how to save cnn model extracted features in hdf5. Later give it to LSTM like human action recognition.

  23. Dl September 12, 2018 at 5:07 am #

    hallo when I run the,
    I get the error:
    usage: [-h] -m MODEL -c CLASSES -i IMAGE [-l COLORS] [-w WIDTH]
    : error: the following arguments are required: -m/–model, -c/–classes, -i/–image
    An exception has occurred, use %tb to see the full traceback.
    What’s my problem? Can someone help me out?

  24. Nishant September 13, 2018 at 2:02 am #

    I am trying to execute this program but it is giving the following error. Please help

    module ‘cv2.dnn’ has no attribute ‘readNet’

    • Nishant September 13, 2018 at 2:20 am #

      My previous query is resolved thanks to your solution mention above. I am facing this error now.

      (h, w) = image.shape[:2]
      AttributeError: ‘NoneType’ object has no attribute ‘shape’

      • Adrian Rosebrock September 14, 2018 at 9:41 am #

        Your path to the input image is incorrect and “cv2.imread” is returning “None”. Double-check your path to the input image and make sure you read on on NoneType errors in this tutorial.

  25. Nishant September 13, 2018 at 5:26 am #

    How can I use this to detect walls, ceilings etc in a room?

    • Adrian Rosebrock September 14, 2018 at 9:37 am #

      You would need to fine-tune this model on a dataset of walls, ceilings, etc. Do you have such a dataset? If not, you would want to research one.

  26. Utkarsh September 13, 2018 at 8:47 am #

    thanks for the post. It was great

    • Adrian Rosebrock September 14, 2018 at 9:33 am #

      Thanks Utkarsh, I’m glad you liked it! 🙂

  27. Haider September 14, 2018 at 7:53 am #

    Hello Adrian,

    The article is wonderful. Thanks.
    Can I perform transfer learning on this model. Can you please refer me some method.
    I want to classify some more terrains with the help of this model.


  28. Amie September 14, 2018 at 9:41 am #

    hi , thank you for this article
    how can i let this model detect only the fence ? is that possible using this technique?

    • Adrian Rosebrock September 14, 2018 at 9:54 am #

      This exact model won’t be able to segment fences; however, if you have a dataset of fence images you could train or fine-tune a model to detect fences.

      • Amie September 14, 2018 at 10:34 am #

        i have trained a model using my fence’s images but my images dont have only fences ,they have also the background so my model detect also the background and consider it as fence

        • Adrian Rosebrock September 17, 2018 at 3:00 pm #

          It sounds like you may not have annotated your dataset correctly. How did you label your images? Did you create a mask for only the fence pixels in your dataset?

  29. Rajarshi Lahiri September 15, 2018 at 12:52 am #

    Can you please tell what step I need to add to your code so that I get only the road mask?
    It would be very helpful.

    • Adrian Rosebrock September 17, 2018 at 2:53 pm #

      You would want to build a mask for your returned class IDs with the pixels of the road mask set to 1 (or 255) and all other values to zero. If you’re new to Python and OpenCV I would recommend reading up on bitwise masking and NumPy array indexing.

  30. PJ September 15, 2018 at 1:01 pm #

    Hi Adrian,

    Will the other models like VGG19, ResNet50 or U-Net work for cityscape datasets and how to write .net file as you have written

    • Adrian Rosebrock September 17, 2018 at 2:26 pm #

      If you want to use a different backbone or base network you would need to train it yourself. Make sure you read the “What if I want to train my own segmentation networks?” of this tutorial.

  31. Bharat September 15, 2018 at 8:07 pm #

    Hi Adrian,

    Can we use this method to blur backgrounds so only the people or objects in the fore-ground are clear and everything else behind them is blurry? Or are there simpler methods to accomplish that.

    Thank you for all the amazing stuff you share!

    • Adrian Rosebrock September 17, 2018 at 2:23 pm #

      Yes, absolutely. You would want to:

      1. Apply semantic segmentation
      2. Grab the mask for the area you’re interested in
      3. Copy the original image
      4. Blur it
      5. Bitwise AND it with your masked region

  32. kazu September 26, 2018 at 2:53 am #

    Can not load caffemodel?

  33. Mark Shostak September 28, 2018 at 2:35 pm #

    Hi Adrian,

    Are there any trained models for in-door applications?

    Thank you for all the amazing stuff you share!

    • Adrian Rosebrock October 8, 2018 at 12:21 pm #

      Do you mean segmentation of indoor scenes, such as walls, ceiling, floor, chair, etc.?

      • Mark Shostak October 12, 2018 at 5:25 am #

        Adrian Hi.

        Yes, this is exactly what I mean


        • Adrian Rosebrock October 12, 2018 at 8:47 am #

          I know I’ve seen pre-trained models for indoor scene understanding but I’m totally blanking on the name of the dataset or the model. I hope another PyImageSearch reader can help me out!

  34. eno October 1, 2018 at 2:33 am #

    When changing to Caffe, why does processing go but not segmentation?
    net = cv2.dnn.readNetFromCaffe (arga.prototxt, arga.caffemodel)

    • Adrian Rosebrock October 8, 2018 at 10:47 am #

      I’m not sure what you mean by “why does processing go but not segmentation” — could you elaborate?

  35. kelemu October 21, 2018 at 6:57 am #

    hi Adrian
    Really your each day blogs is surprised me the contents and the way you write is easy understandable. but how to use multiple .txt file for class and multiple images/videos

    • Adrian Rosebrock October 22, 2018 at 8:03 am #

      Thanks Kelemu, I’m glad you are enjoying the PyImageSearch blog!

      As for your question, I’m not sure what you mean by using multiple .txt files. What is your end goal?

  36. Sai November 1, 2018 at 1:30 pm #

    As usual, very high quality tutorials and blog!!! I like the way you present not only the practical part but also including the full references (Research papers) and credits for those who wanted to know more in detail. Really appreciate for your effort on knowledge contribution to the community. Well done Adrian!! I am still exciting and waiting for your new books to come out. 😉

    • Adrian Rosebrock November 2, 2018 at 7:16 am #

      Thank you Sai, I really appreciate your kind words 🙂

  37. townsw November 3, 2018 at 11:34 pm #

    Hi Adrian,
    Thank you for your helpful tutorial.
    I have a question:how can I use the GPU on this project?

    • Adrian Rosebrock November 6, 2018 at 1:28 pm #

      OpenCV doesn’t currently support CUDA GPUs very well for their “dnn” module. Support is coming but unless you have an Intel GPU you won’t be able to use this code with a GPU.

  38. Hamza November 12, 2018 at 7:41 am #

    Hi Adrian

    Thank you for your this amazing post this is impecable

    I wanna ask you about if there is a way to applyic semant segmentation using Enet Architecture on bills (like water bills ), I mean if there is a dataset like cityscape that you used and you can give me a link to search

    Kind Regards

    • Adrian Rosebrock November 13, 2018 at 4:42 pm #

      “Bills” as in the bills/invoices that we pay?

      • Hamza November 14, 2018 at 2:59 pm #

        Hi adrian
        yes exactly
        the aim of my project is taking a bill with a precise format ( for example ) : Date : atthe top left , Name of person at the top right of the bill , Total to pay at the bottom …

        and do a semantic segmentation of that bill , like to do learn our algorithm where are the fields of the bills and know what it is.

        Thank you

        • Adrian Rosebrock November 15, 2018 at 11:56 am #

          Semantic segmentation would be way overkill for such a project. I would suggest you instead look at image registration/document registration algorithms.

          • Hamza November 15, 2018 at 4:55 pm #

            Thank your mr adrian

  39. sppp December 18, 2018 at 7:43 am #

    how to run in ubuntu

    • Adrian Rosebrock December 18, 2018 at 8:44 am #

      This code will work in Ubuntu. Just use the “Downloads” section of the tutorial to download the code and model.

  40. Ritika January 23, 2019 at 10:44 pm #

    I am working on some crop weed segmentation problem. I want to apply semantic segmentation using U-Net architecture. But UNet architecture is not clear to me. Can you please share any case study on U-Net architecture. Thanks in advance.

    • Adrian Rosebrock January 25, 2019 at 7:14 am #

      I don’t have any tutorials on U-Net but I will consider it in the future. Thanks for the suggestion.

  41. Sara March 3, 2019 at 5:27 pm #

    Hi Adrian,
    Thank you for the great post. I would like to use this code on grey scale image but it didn’t work !:(.
    i’m new to this filed.

  42. james March 9, 2019 at 3:33 pm #

    Hi Adrian,

    Really cool tutorial. I’ve tried running the model on some images I took with my iphone and the results are really poor compared to the examples. Any tips for possible pre-processing I should be doing?

    • Adrian Rosebrock March 13, 2019 at 3:47 pm #

      It’s hard to say what the issue is without seeing your example images. Keep in mind that deep learning algorithms, while impressive, are not magic. They are only as good as the data they were trained on. In this case your input images may be significantly different than what the model was trained on.

  43. Annie Dobbyn March 26, 2019 at 6:05 pm #


    Implementing to view each individual class mask I’ve noticed that some pixels are being multi-classed. Example: If ran with the image “example_01.png” the sign in the top left corner is in both the “Person” class and the “TrafficSign” class.

    In the final colour mapped output, the sign is correctly colour-coded but I’m not understanding why
    1) it’s included in two masks as classMap = np.argmax(output[0], axis=0) shouldn’t allow for this
    2) why the final colour map is correct but examining the individual class masks shows contradictions to this. At first I thought the “final” decision for a pixels-class would be whatever it was classed as last but this isn’t the case.

    Any help you can give would be appreciated.

  44. Hami April 20, 2019 at 11:37 pm #

    Hello adrian,thank you for awesome tutorial.i have a question,Is this the idea of using xray images to detect objects inside the bag?

  45. shah August 3, 2019 at 5:24 am #

    Adrian great tutorial
    can I fine train this model on semantic segmentation of MRI brain images.

    • Adrian Rosebrock August 7, 2019 at 12:32 pm #

      I don’t have any tutorials on semantic segmentation on MRI images but I hope to cover it in the future.

  46. sriharsha September 18, 2019 at 6:21 am #

    Is it possible to design an automatic image segmentation tool?

  47. Ankit December 18, 2019 at 1:47 am #

    Hello Sir,
    This is an awesome tutorial as always.
    I wanted to know how can I crop each segmented area?
    Thank you

    • Adrian Rosebrock December 18, 2019 at 9:42 am #

      You can use NumPy array slicing to to extract the ROI and save it to disk. If you are new to using OpenCV and are unfamiliar with cropping ROIs, be sure to read Practical Python and OpenCV to first learn the basics.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply