Keras Mask R-CNN

In this tutorial, you will learn how to use Keras and Mask R-CNN to perform instance segmentation (both with and without a GPU).

Using Mask R-CNN we can perform both:

  1. Object detection, giving us the (x, y)-bounding box coordinates of for each object in an image.
  2. Instance segmentation, enabling us to obtain a pixel-wise mask for each individual object in an image.

An example of instance segmentation via Mask R-CNN can be seen in the image at the top of this tutorial — notice how we not only have the bounding box of the objects in the image, but we also have pixel-wise masks for each object as well, enabling us to segment each individual object (something that object detection alone does not give us).

Instance segmentation, along with Mask R-CNN, powers some of the recent advances in the “magic” we see in computer vision, including self-driving cars, robotics, and more.

In the remainder of this tutorial, you will learn how to use Mask R-CNN with Keras, including how to perform instance segmentation on your own images.

To learn more about Keras and Mask R-CNN, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras Mask R-CNN

In the first part of this tutorial, we’ll briefly review the Mask R-CNN architecture. From there, we’ll review our directory structure for this project and then install Keras + Mask R-CNN on our system.

I’ll then show you how to implement Mask R-CNN and Keras using Python.

Finally, we’ll apply Mask R-CNN to our own images and examine the results.

I’ll also share resources on how to train a Mask R-CNN model on your own custom dataset.

The History of Mask R-CNN

Figure 1: The Mask R-CNN architecture by He et al. enables object detection and pixel-wise instance segmentation. This blog post uses Keras to work with a Mask R-CNN model trained on the COCO dataset.

The Mask R-CNN model for instance segmentation has evolved from three preceding architectures for object detection:

  • R-CNN: An input image is presented to the network, Selective Search is run on the image, and then the output regions from Selective Search are used for feature extraction and classification using a pre-trained CNN.
  • Fast R-CNN: Still uses the Selective Search algorithm to obtain region proposals, but adds the Region of Interest (ROI) Pooling module. Extracts a fixed-size window from the feature map and uses the features to obtain the final class label and bounding box. The benefit is that the network is now end-to-end trainable.
  • Faster R-CNN: Introduces the Regional Proposal Network (RPN) that bakes the region proposal directly into the architecture, alleviating the need for the Selective Search algorithm.

The Mask R-CNN algorithm builds on the previous Faster R-CNN, enabling the network to not only perform object detection but pixel-wise instance segmentation as well!

I’ve covered Mask R-CNN in-depth inside both:

  1. The “What is Mask R-CNN?” section of the Mask R-CNN with OpenCV post.
  2. My book, Deep Learning for Computer Vision with Python.

Please refer to those resources for more in-depth details on how the architecture works, including the ROI Align module and how it facilitates instance segmentation.

Project structure

Go ahead and use the “Downloads” section of today’s blog post to download the code and pre-trained model. Let’s inspect our Keras Mask R-CNN project structure:

Our project consists of a testing images/  directory as well as three files:

  • coco_labels.txt : Comprised of a line-by-line listing of 81 class labels. The first label is the “background” class, so typically we say there are 80 classes.
  • mask_rcnn_coco.h5 : Our pre-trained Mask R-CNN model weights file which will be loaded from disk.
  • : The Mask R-CNN demo script loads the labels and model/weights. From there, an inference is made on a testing image provided via a command line argument. You may test with one of our own images or any in the  images/  directory included with the “Downloads”.

Before we review today’s script, we’ll install Keras + Mask R-CNN and then we’ll briefly review the COCO dataset.

Installing Keras Mask R-CNN

The Keras + Mask R-CNN installation process is quote straightforward with pip, git, and . I recommend you install these packages in a dedicated virtual environment for today’s project so you don’t complicate your system’s package tree.

First, install the required Python packages:

Be sure to install tensorflow-gpu  if you have a GPU, CUDA, and cuDNN installed in your machine.

From there, go ahead and install OpenCV, either via pip or compiling from source:

Next, we’ll install the Matterport implementation of Mask R-CNN in Keras:

Finally, fire up a Python interpreter in your virtual environment to verify that Mask R-CNN + Keras and OpenCV have been successfully installed:

Provided that there are no import errors, your environment is now ready for today’s blog post.

Mask R-CNN and COCO

The Mask R-CNN model we’ll be using here today is pre-trained on the COCO dataset.

This dataset includes a total of 80 classes (plus one background class) that you can detect and segment from an input image (with the first class being the background class). I have included the labels file named coco_labels.txt  in the “Downloads” associated with this post, but out of convenience, I have included them here for you:

  1. BG
  2. person
  3. bicycle
  4. car
  5. motorcycle
  6. airplane
  7. bus
  8. train
  9. truck
  10. boat
  11. traffic light
  12. fire hydrant
  13. stop sign
  14. parking meter
  15. bench
  16. bird
  17. cat
  18. dog
  19. horse
  20. sheep
  21. cow
  22. elephant
  23. bear
  24. zebra
  25. giraffe
  26. backpack
  27. umbrella
  1. handbag
  2. tie
  3. suitcase
  4. frisbee
  5. skis
  6. snowboard
  7. sports ball
  8. kite
  9. baseball bat
  10. baseball glove
  11. skateboard
  12. surfboard
  13. tennis racket
  14. bottle
  15. wine glass
  16. cup
  17. fork
  18. knife
  19. spoon
  20. bowl
  21. banana
  22. apple
  23. sandwich
  24. orange
  25. broccoli
  26. carrot
  27. hot dog
  1. pizza
  2. donut
  3. cake
  4. chair
  5. couch
  6. potted plant
  7. bed
  8. dining table
  9. toilet
  10. tv
  11. laptop
  12. mouse
  13. remote
  14. keyboard
  15. cell phone
  16. microwave
  17. oven
  18. toaster
  19. sink
  20. refrigerator
  21. book
  22. clock
  23. vase
  24. scissors
  25. teddy bear
  26. hair drier
  27. toothbrush

In the next section, we’ll learn how to use Keras and Mask R-CNN to detect and segment each of these classes.

Implementing Mask R-CNN with Keras and Python

Let’s get started implementing Mask R-CNN segmentation script.

Open up the  and insert the following code:

Lines 2-11 import our required packages.

The mrcnn  imports are from Matterport’s implementation of Mask R-CNN. From mrcnn , we’ll use Config  to create a custom subclass for our configuration, modellib  to load our model, and visualize  to draw our masks.

Let’s go ahead and parse our command line arguments:

Our script requires three command line arguments:

  • --weights : The path to our Mask R-CNN model weights pre-trained on COCO.
  • --labels : The path to our COCO class labels text file.
  • --image : Our input image path. We’ll be performing instance segmentation on the image provided via the command line.

Using the second argument, let’s go ahead and load our CLASS_NAMES  and COLORS  for each:

Line 24 loads the COCO class label names directly from the text file into a list.

From there, Lines 28-31 generate random, distinct COLORS  for each class label. The method comes from Matterport’s Mask R-CNN implementation on GitHub.

Let’s go ahead and construct our SimpleConfig  class:

Our SimpleConfig  class inherits from Matterport’s Mask R-CNN Config  (Line 33).

The configuration is given a NAME (Line 35).

From there we set the GPU_COUNT  and IMAGES_PER_GPU  (i.e., batch). If you have a GPU and tensorflow-gpu installed then Keras + Mask R-CNN will automatically use your GPU. If not, your CPU will be used instead.

Note: I performed today’s experiment on a machine using a single Titan X GPU, so I set my GPU_COUNT = 1 . While my 12GB GPU could technically handle more than one image at a time (either during training or during prediction as in this script), I decided to set IMAGES_PER_GPU = 1  as most readers will not have a GPU with as much memory. Feel free to increase this value if your GPU can handle it.

Our NUM_CLASSES  is then set equal to the length of the CLASS_NAMES  list (Line 45).

Next, we’ll initialize our config and load our model:

Line 48 instantiates our config .

Then, using our config , Lines 53-55 load our Mask R-CNN model  pre-trained on the COCO dataset.

Let’s go ahead and perform instance segmentation:

Lines 59-61 load and preprocess our image . Our model expects images in RGB format so we use cv2.cvtColor  to swap the color channels (in contrast OpenCV’s default BGR color channel ordering).

Line 65 then performs a forward pass of the image  through the network to make both object detection and pixel-wise mask predictions.

The remaining two code blocks will process the results so that we can visualize the objects’ bounding boxes and masks using OpenCV:

In order to visualize the results, we begin by looping over object detections (Line 68). Inside the loop, we:

  • Grab the unique classID  integer (Line 71).
  • Extract the mask  for the current detection (Line 72).
  • Determine the color  used to visualize  the mask (Line 73).
  • Apply/draw our predicted pixel-wise mask on the object using a semi-transparent alpha  channel (Line 76).

From here, we’ll draw bounding boxes and class label + score texts for each object in the image:

Line 80 converts our image  back to BGR (OpenCV’s default color channel ordering).

On Line 83 we begin looping over objects. Inside the loop, we:

  • Extract the bounding box coordinates, classID , label , and score  (Lines 86-89).
  • Compute the color  for the bounding box and text (Line 90).
  • Draw each bounding box (Line 93).
  • Concatenate the class/probability text  (Line 94) and then draw it at the top of the image  (Lines 95-97).

Once the process is complete, the resulting output image  is displayed to the screen until a key is pressed (Lines 100-101).

Mask R-CNN and Keras results

Now that our Mask R-CNN script has been implemented, let’s give it a try.

Make sure you have used the “Downloads” section of this tutorial to download the source code.

You will need to know the concept of command line arguments to run the code. If it is unfamiliar to you, read up on argparse and command line arguments before you try to execute the code.

When you’re ready, open up a terminal and execute the following command:

Figure 2: The Mask R-CNN model trained on COCO created a pixel-wise map of the Jurassic Park jeep (truck), my friend, and me while we celebrated my 30th birthday.

For my 30th birthday, my wife found a person to drive us around Philadelphia in a replica Jurassic Park jeep — here my best friend and I are outside The Academy of Natural Sciences.

Notice how not only bounding boxes are produced for each object (i.e., both people and the jeep), but also pixel-wise masks as well!

Let’s give another image a try:

Figure 3: My dog, Janie, has been segmented from the couch and chair using a Keras and Mask R-CNN deep learning model.

Here is a super adorable photo of my dog, Janie, laying on the couch:

  1. Despite the vast majority of the couch not being visible, the Mask R-CNN is still able to label it as such.
  2. The Mask R-CNN is correctly able to label the dog in the image.
  3. And even though my coffee cup is barely visible, Mask R-CNN is able to label the cup as well (if you look really closely you’ll see that my coffee cup is a Jurassic Park mug!)

The only part of the image that Mask R-CNN is not able to correctly label is the back part of the couch which it mistakes as a chair — looking at the image closely, you can see how Mask R-CNN made the mistake (the region does look quite chair-like versus being part of the couch).

Here’s another example of using Keras + Mask R-CNN for instance segmentation:

Figure 4: A Mask R-CNN segmented image (created with Keras, TensorFlow, and Matterport’s Mask R-CNN implementation). This picture is of me in Page, AZ.

A few years ago, my wife and I made a trip out to Page, AZ (this particular photo was taken just outside Horseshoe Bend) — you can see how the Mask R-CNN has not only detected me but also constructed a pixel-wise mask for my body.

Let’s apply Mask R-CNN to one final image:

Figure 5: Keras + Mask R-CNN with Python of a picture from Ybor City.

One of my favorite cities to visit in the United States is Ybor City — there’s just something I like about the area (and perhaps it’s that the roosters are a protected in thee city and free to roam around).

Here you can see me and such a rooster — notice how each of us is correctly labeled and segmented by the Mask R-CNN. You’ll also notice that the Mask R-CNN model was able to localize each of the individual cars and label the bus!

Can Mask R-CNN run in real-time?

At this point you’re probably wondering if it’s possible to run Keras + Mask R-CNN in real-time, right?

As you know from “The History of Mask R-CNN?” section above, Mask R-CNN is based on the Faster R-CNN object detectors.

Faster R-CNNs are incredibly computationally expensive, and when you add instance segmentation on top of object detection, the model only becomes more computationally expensive, therefore:

  • On a CPU, a Mask R-CNN cannot run in real-time.
  • But on a GPU, Mask R-CNN can get up to 5-8 FPS.

If you would like to run Mask R-CNN in semi-real-time, you will need a GPU.

How can I train a Mask R-CNN model on my own custom dataset?

Figure 6: Inside my book, Deep Learning for Computer Vision with Python, you will learn how to annotate your own training data, train your custom Mask R-CNN, and apply it to your own images. I also provide two case studies on (1) skin lesion/cancer segmentation and (2) prescription pill segmentation, a first step in pill identification.

The Mask R-CNN model we used in this tutorial was pre-trained on the COCO dataset…

…but what if you wanted to train a Mask R-CNN on your own custom dataset?

Inside my book, Deep Learning for Computer Vision with Python, I:

  1. Teach you how to train a Mask R-CNN to automatically detect and segment cancerous skin lesions — a first step in building an automatic cancer risk factor classification system.
  2. Provide you with my favorite image annotation tools, enabling you to create masks for your input images.
  3. Show you how to train a Mask R-CNN on your custom dataset.
  4. Provide you with my best practices, tips, and suggestions when training your own Mask R-CNN.

All of the Mask R-CNN chapters include a detailed explanation of both the algorithms and code, ensuring you will be able to successfully train your own Mask R-CNNs.

To learn more about my book (and grab your free set of sample chapters and table of contents), just click here.


In this tutorial, you learned how to use Keras + Mask R-CNN to perform instance segmentation.

Unlike object detection, which only gives you the bounding box (x, y)-coordinates for an object in an image, instance segmentation takes it a step further, yielding pixel-wise masks for each object.

Using instance segmentation we can actually segment an object from an image.

To perform instance segmentation we used the Matterport Keras + Mask R-CNN implementation.

We then created a Python script that:

  1. Constructed a configuration class for Mask R-CNN (both with and without a GPU).
  2. Loaded the Keras + Mask R-CNN architecture from disk
  3. Preprocessed our input image
  4. Detected objects/masks in the image
  5. Visualized the results

If you are interested in how to:

  1. Label and annotate your own custom image dataset
  2. And then train a Mask R-CNN model on top of your annotated dataset…

…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover Mask R-CNN and annotation in detail.

I hope you enjoyed today’s post!

To download the source code (including the pre-trained Keras + Mask R- CNN model), just enter your email address in the form below! I’ll be sure to let you know when future tutorials are published here on PyImageSearch.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

90 Responses to Keras Mask R-CNN

  1. abdul June 10, 2019 at 11:40 am #

    thank you

    • Adrian Rosebrock June 10, 2019 at 1:22 pm #

      You are welcome, Abdul!

  2. Tiri June 10, 2019 at 12:36 pm #

    Hi, super interesting as usual!
    I have a question: there are currently lot of changes in the libraries like keras tensorflow or pytorch, do you update the examples you do in your book, with the recent library versions?

    • Adrian Rosebrock June 10, 2019 at 1:23 pm #

      Inside my book, Dep Learning for Computer Vision with Python, I use Keras with a TensorFlow backend. The code is also compatible with TensorFlow 2.0 as well. All examples are kept up to dat with the most recent library versions.

  3. Satheesh June 10, 2019 at 1:57 pm #

    Hi, I was really waiting for this. Thanks alot brother. Also I’m expecting Yolov3 and ssd tutorials from you.

  4. ali June 10, 2019 at 8:01 pm #

    i cant install tensorflow or tensorflow-gpu
    i use pycharm
    what I do ?

    • Adrian Rosebrock June 12, 2019 at 1:35 pm #

      I would suggest you install via a terminal instead, then update your “Project Interpreter” setting in PyCharm.

  5. John June 10, 2019 at 11:50 pm #

    Hi Adrian,

    Thanks for another great tutorial. Just as a rule of thumb what are some ways to improve the FPS performance of the Mask R-CNN algorithm? I am running the code on a dedicated GPU workstation but are not seeing the FPS results you have mentioned. I am getting roughly 1 frame per second.

    • Adrian Rosebrock June 12, 2019 at 1:34 pm #

      What type of GPU are you using?

      • John June 12, 2019 at 7:47 pm #

        My local machine has an NVIDIA Quadro P2000 with 5GB of VRAM. I have also deployed the code to an AWS P3 instance equipped with a NVIDIA Tesla V100 GPU which has 16GB of VRAM. With the AWS instance the inference speedup has been pretty disappointing reaching about 1.3FPS on average.

        • Adrian Rosebrock June 13, 2019 at 9:38 am #

          Thanks John. I might try to dig into that and see what’s going on but I cannot guarantee if/when I’ll be able to do that.

          • John June 13, 2019 at 7:06 pm #

            Thanks Adrian!

          • Adrian Rosebrock June 19, 2019 at 2:22 pm #

            You are welcome!

  6. Hilman June 11, 2019 at 3:59 am #

    High-quality materials for free…
    Adrian, you are really awesome and inspired me deeply.
    I want to say thank you for that!
    Please know that your blog and your book are literally helping the world!

    • Adrian Rosebrock June 12, 2019 at 1:31 pm #

      Thanks Hilman!

  7. joao June 11, 2019 at 12:23 pm #

    Hey! I think you forgot one import: pip install imutils

    Eitherway, very easy to setup, thanks!

    • Adrian Rosebrock June 12, 2019 at 1:30 pm #

      Thanks for catching that, Joao! I have updated the post.

  8. Arjun Sreekumar June 11, 2019 at 5:11 pm #

    In which folder should all the installations be done?

    • Adrian Rosebrock June 12, 2019 at 1:29 pm #

      The folder doesn’t matter for the pip installs. Just make sure you are in the “Mask_RCNN” directory when running the “” script though.

  9. Arjun Sreekumar June 11, 2019 at 5:52 pm #

    Running the code causes an ERROR : Illegal instruction (core dumped). Could you please explain the reason? Thanks in advance

    • Adrian Rosebrock June 12, 2019 at 1:28 pm #

      What line of code is causing the error? Try using “print” statements or “pdb” to debug.

  10. Jeremy June 11, 2019 at 8:25 pm #

    As always great post Adrian. Keep up the good job. Thanks a bunch.

    • Adrian Rosebrock June 12, 2019 at 1:28 pm #

      Thanks Jeremy!

  11. fALAHGS June 12, 2019 at 12:44 am #

    Thank you Adrian for great post
    Is there a tutorial lesson to work on Custom instance Segmentation for our custom object…?

    thanks for help

  12. Ray C June 12, 2019 at 1:54 am #

    Thanks Adrian for this.
    Do you know how much GPU memory this will take for a single stream? I want to stream 4 separate streams and wondering.
    Currently, the test box I have has a 6GB GPU and using yolo3, I seem to be using at least 2 GB per camera stream of it, so no way I can run 4 streams. My requirement is to do object detection for at least 4 streams at around 10 fps at least. What do you recommend for this situation? Is there a way to use multithreading in python for gpu processing? (only one GPU)? I’m running the script multiple times, once for each stream

    also, if you;’re in Ybor City next, ping me, I’ll buy you lunch/dinner. I live in the general area!

    • Adrian Rosebrock June 12, 2019 at 1:27 pm #

      Most GPUs will only be able to handle a single Mask R-CNN model. You won’t be able to run four separate Mask R-CNN models on a single GPU. What you could do is batch process frames, like I do in this tutorial.

  13. Sovit Rath June 12, 2019 at 2:11 am #

    Hello Adrian, I am getting this error. Could you possibly help me out?

    ValueError: Layer #389 (named “mrcnn_bbox_fc”), weight has shape (1024, 364), but the saved weight has shape (1024, 324).

    • Koksal Chou June 20, 2019 at 9:37 am #

      Line 40: NUM_CLASSES = 81

  14. Ramachandra Babu June 12, 2019 at 2:18 am #

    Hi can you please tell me how to perform inference only on person and ignoring other classes.

    Thanks in Advance.

    • Adrian Rosebrock June 12, 2019 at 1:26 pm #

      Simply check the “label” variable. If it’s not “person”, ignore the detection.

  15. mario June 12, 2019 at 7:48 am #

    Hi! i am getting an error just at the beggining when importing things, particulary on

    from mrcnn import model as modellib

    where i get the error “module ‘tensorflow’ has no attribute ‘name_scope'”

    I am using windows and Conda, any clue?
    Thanks in advice!

    • Adrian Rosebrock June 12, 2019 at 1:25 pm #

      What version of Keras and TensorFlow are you using?

  16. Suha Jon June 12, 2019 at 8:37 am #

    Thanks Adrian for great post
    is there tutorial about Custom segmentation ..?

  17. Oli June 12, 2019 at 11:40 am #

    Hi, this is (yet again) really interesting, thanks!
    Can this work for videos in the same way as Mask R-CNN with OpenCV?

    • Adrian Rosebrock June 12, 2019 at 1:24 pm #

      Yes, just apply the Mask R-CNN model to each frame of the video stream.

      • Daniel Trujillo June 23, 2019 at 12:00 am #

        How would you go about it? Script the separation of video to image sequence (or do it in gimp/adobe/whatever), run the rcnn on all the images in a batch, then reconvert to video? Or do you think there is a simpler route?

        I’m sure I’ll have some video version working by time you get to this comment, but I’m definitely interested in the route you’d take either way.

        BTW, I saw the insta ad for this tutorial, keep it up man. Good stuff.

        • Adrian Rosebrock June 26, 2019 at 1:36 pm #

          Have you taken a look at Deep Learning for Computer Vision with Python? That book includes examples of how to write the output of deep learning models to video files. I’ll also be covering it in a future tutorial so stay tuned.

  18. Stephen Meschke June 13, 2019 at 8:30 pm #

    Another solid tutorial. I had some trouble installing the packages. My system was screwed up because I install several Python packages incorrectly. Eventually, I choose to re-install Ubuntu 18 and start fresh. Everything worked fine after that, although I had to use pip3 instead of just pip.

    I spend the majority of my PyImageSearch time trying to properly install the Python packages that you use. Installing and using various packages is critical and, unfortunately, often frustrating.

    • Adrian Rosebrock June 19, 2019 at 2:23 pm #

      Thanks Stephen — although I’m sorry you had to start with a fresh Ubuntu 18.04 install. When it comes to using different Python packages I strongly encourage using Python virtual environments so you don’t run into that issue again.

  19. ang June 14, 2019 at 1:37 pm #

    how to train my custom cnn model

  20. Ibrahim June 17, 2019 at 5:22 am #

    In my case it just shows an image, no detection and segmentation. Moreover I get this warning:
    RuntimeWarning: divide by zero encountered in divide

    • Adrian Rosebrock June 19, 2019 at 2:04 pm #

      What version of Keras and TensorFlow are you using? And what function call throws that error?

  21. Andrew Welham June 17, 2019 at 9:37 am #

    Great blog entry again, This may be a silly question, but if I wanted to only detect people, can i remove other categorisations from Mask R-CNN to speed it up? Or would it make no difference ?

    • Adrian Rosebrock June 19, 2019 at 2:03 pm #

      In terms of speed, no, it would make no difference. That said, if you want to detect only people, just loop over the detections and ignore anything that is not a “person” class.

  22. tigon7476 June 20, 2019 at 3:01 am #

    As a result of the test, it takes 0.4 seconds per image for reasoning time. I installed two GPUs, and set the GPU to RTX 2080Ti, GPU_COUNT = 2. Then an error occurred. Currently there is only one GPU, so I can not generate errors again. I have an ImageNet Bundle. What chapter should I refer to?

    • Adrian Rosebrock June 26, 2019 at 2:00 pm #

      0.2-0.4 seconds per inference with the GPU sounds about right. As far as the error goes can you email me it so I can take a look?

  23. Hamza Atta June 21, 2019 at 1:28 am #

    Amazing tutorial a lot learnt from it.
    Would you please do a tutorial on person re identification

    • Adrian Rosebrock June 26, 2019 at 1:48 pm #

      I will consider it but cannot guarantee if/when it will be.

      • Ali Hassan July 17, 2019 at 9:07 am #

        Yes please consider it.There would be a lot learning from it.

  24. Qasim Zia June 24, 2019 at 3:06 am #

    It may be a dumb question but I am new to this field. I don’t understand the purpose of background class, what is that use for?

    • Adrian Rosebrock June 26, 2019 at 1:23 pm #

      The “background” class is basically your “ignore” class (i.e., contents of an image that have no semantic value).

  25. Miftah July 8, 2019 at 10:34 am #

    Thank you , awesome as ever! I wonder if we can get a mask that delineate the foreground from background as in the case of separating a person (foreground) and anything else as background .. I saw in some literature producing background and foreground with pixel values populated with 0 ,1 respectively to segment out background (foreground) so that they can deal with one of them for application such as person recognition/re-ids .. i wonder how i can use the MASK-R-CNN to produce such mask .. other than the semi-transparent mask shown in the above demo. Thank you

  26. Emmanuel KOUPOH July 11, 2019 at 11:19 am #

    Hi Adrian, i want help and more explanations to implement MASK RCNN from scratch maybe with keras to improuve my level

  27. Hamza Atta July 17, 2019 at 7:43 am #

    I want to know if I only want to use it for human segmentation and not for any other purpose.What changes are required for it

  28. Aidan July 18, 2019 at 1:24 am #

    Hey Adrian, are you able to do a blog post on YOLACT, which is essentially a realtime instance segmentation network? (Authors claim 33 fps on a Titan Xp and 29.8 mAP on COCO’s test-dev) Link to github is here: Maybe you could even update your DL4CV book with a chapter on this teaching us how to train our own model?? Thankyou!

    • Adrian Rosebrock July 25, 2019 at 9:47 am #

      Thanks for the suggestion, Aidan.

  29. Jaiganesh July 18, 2019 at 9:48 pm #

    Hi Adrian,

    I am looking for a good object detection “framework” to detect objects in an congested (hospital) environment. After going through some of your blogs on SSD, Yolo, Mask R-CNN and Keras with Mask R-CNN. Which one is the best among these? and also train the model with custom objects?

    Also, may i know what is the difference between your 2 blogs on mask R-CNN with Python and Keras Mask R-CNN:
    this blog (

    Is there any performance difference?

    • Adrian Rosebrock July 25, 2019 at 9:44 am #

      There is no true “best” object detector, it’s all based on your situation. For high accuracy, especially on small objects, Faster R-CNNs work really well. SSDs are faster than Faster R-CNNs but less accurate. YOLO can be very fast but harder to train and least accurate out of them. If you’re interested in my additional commentary on object detectors, including how to train them, refer to Deep Learning for Computer Vision with Python.

  30. Debal July 23, 2019 at 1:49 pm #

    hi Adrian
    thanks for yet another great tutorial.
    my question is similar to the one posted by Miftah.
    I am trying to apply this version of masked R-CNN to a foreground extraction problem.
    however this model seems to work only on the objects it is trained to identify whereas my problem requires a very generic foreground extraction with mask results as good as this model.
    do you have any suggestions for me?

    • Adrian Rosebrock July 25, 2019 at 9:29 am #

      Hey Debal — you mean you need to train your own Mask R-CNN model? If so, you should refer to Deep Learning for Computer Vision with Python.

      • Shashank August 7, 2019 at 3:04 am #

        Dear Adrian, Your book doesn’t cover the mask RCNN. It covers only the faster RCNN and SSD in the tensorflow object detection API section. Can you guide me how to prepare the dataset and make the record file when it comes to mask RCNN? I wish to do this with the tensorflow object detection api

        • Adrian Rosebrock August 7, 2019 at 12:18 pm #

          Actually, my book DOES cover Mask R-CNN. You can find it inside the ImageNet Bundle of Deep Learning for Computer Vision with Python. The ImageNet Bundle chapters includes a set of bonus guides, including Mask R-CNN chapters.

  31. Rida July 28, 2019 at 11:16 pm #

    I need help with pests/insects recognition

  32. Hung August 5, 2019 at 1:53 am #

    Could you please explain “yielding pixel-wise masks for each object” for me? How effective is it? Thank you!!!

    • Adrian Rosebrock August 7, 2019 at 12:27 pm #

      A “pixel-wise mask” is simply a mask with the same dimensions of the input image. It specifies which pixels belong to a given foreground object and which belong to the background.

  33. Patrick August 7, 2019 at 3:11 am #

    so, how can i get the source code!

    • Adrian Rosebrock August 7, 2019 at 12:18 pm #

      You use the “Downloads” section of this post.

  34. saravana kumar August 17, 2019 at 12:10 am #

    How to expand the number of classes to 300 plus and then run predictions on the images.

  35. Edgar August 22, 2019 at 6:53 pm #

    How can I apply this to a video?

  36. konate September 5, 2019 at 6:48 pm #

    Hi Adrian,

    I am very happy to be here. I was looking dor while where to find a website with code source as clear as you do. Thank you a lot

    • Adrian Rosebrock September 12, 2019 at 11:13 am #

      Thanks Konate 🙂

  37. Naveen October 17, 2019 at 9:54 am #

    An error is occurring

    AttributeError: module ‘tensorflow’ has no attribute ‘log’
    TF version is 2.0
    Opencv Version 4.1.1
    keras 2.3.1
    can you help?

    • Adrian Rosebrock October 25, 2019 at 10:32 am #

      You cannot use the Matterport Mask R-CNN library with TensorFLow 2.0 (yet). You will need to use TensorFlow 1.x.

      • Gang November 25, 2019 at 1:21 am #

        Dear Adrian
        Is it better to downgrade Tensorflow version or there have alternatives?

        • Adrian Rosebrock December 5, 2019 at 10:56 am #

          You would need to downgrade your TensorFlow install.

  38. Efren Lopez October 17, 2019 at 7:00 pm #

    Dear Adrian, I am getting an error about my CUDA version, because is insuficient for CUDA runtime. My nvidia driver is 396 and my cuda version is 9.2. I do not understand why I getting that error. Best

  39. Karthik November 5, 2019 at 2:53 am #

    Hi Adrian,

    Thank you for the great tutorial. I would like to purchase the Imagenet Bundle of deep-learning-computer-vision-python-book. I have few questions.

    1. Is the source code is compatible for tensorflow 2.0?
    2. If not will you be updating it? and Is the updates available for free?
    3. Future Updates are free?
    4. Is the hard copy ships to India? as i am from India.


    • Adrian Rosebrock November 7, 2019 at 10:20 am #

      1. Yes, all source code is compatible with TensorFlow 2.0.
      2. Yes, I update DL4CV once, sometimes twice, per year. All updates are free.
      3. Yes, all future updates to DL4CV are free.
      4. Yes, I can ship to India at no additional cost.

      If you have any other questions just let me know, otherwise just use this link to pick up your copy.

  40. lyelchuri November 21, 2019 at 2:25 am #

    I am trying to predict hand gloves and spects using mask rcnn. I am facing the following issues:
    1.the people who are not wearing gloves also it is taking as glove.i think it is taking hand structure
    2.It is complety getting baised on colur.where ever it find’s white color it is predicting as gloves.

    Please help me.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply