Keras Mask R-CNN

In this tutorial, you will learn how to use Keras and Mask R-CNN to perform instance segmentation (both with and without a GPU).

Using Mask R-CNN we can perform both:

  1. Object detection, giving us the (x, y)-bounding box coordinates of for each object in an image.
  2. Instance segmentation, enabling us to obtain a pixel-wise mask for each individual object in an image.

An example of instance segmentation via Mask R-CNN can be seen in the image at the top of this tutorial — notice how we not only have the bounding box of the objects in the image, but we also have pixel-wise masks for each object as well, enabling us to segment each individual object (something that object detection alone does not give us).

Instance segmentation, along with Mask R-CNN, powers some of the recent advances in the “magic” we see in computer vision, including self-driving cars, robotics, and more.

In the remainder of this tutorial, you will learn how to use Mask R-CNN with Keras, including how to perform instance segmentation on your own images.

To learn more about Keras and Mask R-CNN, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Keras Mask R-CNN

In the first part of this tutorial, we’ll briefly review the Mask R-CNN architecture. From there, we’ll review our directory structure for this project and then install Keras + Mask R-CNN on our system.

I’ll then show you how to implement Mask R-CNN and Keras using Python.

Finally, we’ll apply Mask R-CNN to our own images and examine the results.

I’ll also share resources on how to train a Mask R-CNN model on your own custom dataset.

The History of Mask R-CNN

Figure 1: The Mask R-CNN architecture by He et al. enables object detection and pixel-wise instance segmentation. This blog post uses Keras to work with a Mask R-CNN model trained on the COCO dataset.

The Mask R-CNN model for instance segmentation has evolved from three preceding architectures for object detection:

  • R-CNN: An input image is presented to the network, Selective Search is run on the image, and then the output regions from Selective Search are used for feature extraction and classification using a pre-trained CNN.
  • Fast R-CNN: Still uses the Selective Search algorithm to obtain region proposals, but adds the Region of Interest (ROI) Pooling module. Extracts a fixed-size window from the feature map and uses the features to obtain the final class label and bounding box. The benefit is that the network is now end-to-end trainable.
  • Faster R-CNN: Introduces the Regional Proposal Network (RPN) that bakes the region proposal directly into the architecture, alleviating the need for the Selective Search algorithm.

The Mask R-CNN algorithm builds on the previous Faster R-CNN, enabling the network to not only perform object detection but pixel-wise instance segmentation as well!

I’ve covered Mask R-CNN in-depth inside both:

  1. The “What is Mask R-CNN?” section of the Mask R-CNN with OpenCV post.
  2. My book, Deep Learning for Computer Vision with Python.

Please refer to those resources for more in-depth details on how the architecture works, including the ROI Align module and how it facilitates instance segmentation.

Project structure

Go ahead and use the “Downloads” section of today’s blog post to download the code and pre-trained model. Let’s inspect our Keras Mask R-CNN project structure:

Our project consists of a testing images/  directory as well as three files:

  • coco_labels.txt : Comprised of a line-by-line listing of 81 class labels. The first label is the “background” class, so typically we say there are 80 classes.
  • mask_rcnn_coco.h5 : Our pre-trained Mask R-CNN model weights file which will be loaded from disk.
  • : The Mask R-CNN demo script loads the labels and model/weights. From there, an inference is made on a testing image provided via a command line argument. You may test with one of our own images or any in the  images/  directory included with the “Downloads”.

Before we review today’s script, we’ll install Keras + Mask R-CNN and then we’ll briefly review the COCO dataset.

Installing Keras Mask R-CNN

The Keras + Mask R-CNN installation process is quote straightforward with pip, git, and . I recommend you install these packages in a dedicated virtual environment for today’s project so you don’t complicate your system’s package tree.

First, install the required Python packages:

Be sure to install tensorflow-gpu  if you have a GPU, CUDA, and cuDNN installed in your machine.

From there, go ahead and install OpenCV, either via pip or compiling from source:

Next, we’ll install the Matterport implementation of Mask R-CNN in Keras:

Finally, fire up a Python interpreter in your virtual environment to verify that Mask R-CNN + Keras and OpenCV have been successfully installed:

Provided that there are no import errors, your environment is now ready for today’s blog post.

Mask R-CNN and COCO

The Mask R-CNN model we’ll be using here today is pre-trained on the COCO dataset.

This dataset includes a total of 80 classes (plus one background class) that you can detect and segment from an input image (with the first class being the background class). I have included the labels file named coco_labels.txt  in the “Downloads” associated with this post, but out of convenience, I have included them here for you:

  1. BG
  2. person
  3. bicycle
  4. car
  5. motorcycle
  6. airplane
  7. bus
  8. train
  9. truck
  10. boat
  11. traffic light
  12. fire hydrant
  13. stop sign
  14. parking meter
  15. bench
  16. bird
  17. cat
  18. dog
  19. horse
  20. sheep
  21. cow
  22. elephant
  23. bear
  24. zebra
  25. giraffe
  26. backpack
  27. umbrella
  1. handbag
  2. tie
  3. suitcase
  4. frisbee
  5. skis
  6. snowboard
  7. sports ball
  8. kite
  9. baseball bat
  10. baseball glove
  11. skateboard
  12. surfboard
  13. tennis racket
  14. bottle
  15. wine glass
  16. cup
  17. fork
  18. knife
  19. spoon
  20. bowl
  21. banana
  22. apple
  23. sandwich
  24. orange
  25. broccoli
  26. carrot
  27. hot dog
  1. pizza
  2. donut
  3. cake
  4. chair
  5. couch
  6. potted plant
  7. bed
  8. dining table
  9. toilet
  10. tv
  11. laptop
  12. mouse
  13. remote
  14. keyboard
  15. cell phone
  16. microwave
  17. oven
  18. toaster
  19. sink
  20. refrigerator
  21. book
  22. clock
  23. vase
  24. scissors
  25. teddy bear
  26. hair drier
  27. toothbrush

In the next section, we’ll learn how to use Keras and Mask R-CNN to detect and segment each of these classes.

Implementing Mask R-CNN with Keras and Python

Let’s get started implementing Mask R-CNN segmentation script.

Open up the  and insert the following code:

Lines 2-11 import our required packages.

The mrcnn  imports are from Matterport’s implementation of Mask R-CNN. From mrcnn , we’ll use Config  to create a custom subclass for our configuration, modellib  to load our model, and visualize  to draw our masks.

Let’s go ahead and parse our command line arguments:

Our script requires three command line arguments:

  • --weights : The path to our Mask R-CNN model weights pre-trained on COCO.
  • --labels : The path to our COCO class labels text file.
  • --image : Our input image path. We’ll be performing instance segmentation on the image provided via the command line.

Using the second argument, let’s go ahead and load our CLASS_NAMES  and COLORS  for each:

Line 24 loads the COCO class label names directly from the text file into a list.

From there, Lines 28-31 generate random, distinct COLORS  for each class label. The method comes from Matterport’s Mask R-CNN implementation on GitHub.

Let’s go ahead and construct our SimpleConfig  class:

Our SimpleConfig  class inherits from Matterport’s Mask R-CNN Config  (Line 33).

The configuration is given a NAME (Line 35).

From there we set the GPU_COUNT  and IMAGES_PER_GPU  (i.e., batch). If you have a GPU and tensorflow-gpu installed then Keras + Mask R-CNN will automatically use your GPU. If not, your CPU will be used instead.

Note: I performed today’s experiment on a machine using a single Titan X GPU, so I set my GPU_COUNT = 1 . While my 12GB GPU could technically handle more than one image at a time (either during training or during prediction as in this script), I decided to set IMAGES_PER_GPU = 1  as most readers will not have a GPU with as much memory. Feel free to increase this value if your GPU can handle it.

Our NUM_CLASSES  is then set equal to the length of the CLASS_NAMES  list (Line 45).

Next, we’ll initialize our config and load our model:

Line 48 instantiates our config .

Then, using our config , Lines 53-55 load our Mask R-CNN model  pre-trained on the COCO dataset.

Let’s go ahead and perform instance segmentation:

Lines 59-61 load and preprocess our image . Our model expects images in RGB format so we use cv2.cvtColor  to swap the color channels (in contrast OpenCV’s default BGR color channel ordering).

Line 65 then performs a forward pass of the image  through the network to make both object detection and pixel-wise mask predictions.

The remaining two code blocks will process the results so that we can visualize the objects’ bounding boxes and masks using OpenCV:

In order to visualize the results, we begin by looping over object detections (Line 68). Inside the loop, we:

  • Grab the unique classID  integer (Line 71).
  • Extract the mask  for the current detection (Line 72).
  • Determine the color  used to visualize  the mask (Line 73).
  • Apply/draw our predicted pixel-wise mask on the object using a semi-transparent alpha  channel (Line 76).

From here, we’ll draw bounding boxes and class label + score texts for each object in the image:

Line 80 converts our image  back to BGR (OpenCV’s default color channel ordering).

On Line 83 we begin looping over objects. Inside the loop, we:

  • Extract the bounding box coordinates, classID , label , and score  (Lines 86-89).
  • Compute the color  for the bounding box and text (Line 90).
  • Draw each bounding box (Line 93).
  • Concatenate the class/probability text  (Line 94) and then draw it at the top of the image  (Lines 95-97).

Once the process is complete, the resulting output image  is displayed to the screen until a key is pressed (Lines 100-101).

Mask R-CNN and Keras results

Now that our Mask R-CNN script has been implemented, let’s give it a try.

Make sure you have used the “Downloads” section of this tutorial to download the source code.

You will need to know the concept of command line arguments to run the code. If it is unfamiliar to you, read up on argparse and command line arguments before you try to execute the code.

When you’re ready, open up a terminal and execute the following command:

Figure 2: The Mask R-CNN model trained on COCO created a pixel-wise map of the Jurassic Park jeep (truck), my friend, and me while we celebrated my 30th birthday.

For my 30th birthday, my wife found a person to drive us around Philadelphia in a replica Jurassic Park jeep — here my best friend and I are outside The Academy of Natural Sciences.

Notice how not only bounding boxes are produced for each object (i.e., both people and the jeep), but also pixel-wise masks as well!

Let’s give another image a try:

Figure 3: My dog, Janie, has been segmented from the couch and chair using a Keras and Mask R-CNN deep learning model.

Here is a super adorable photo of my dog, Janie, laying on the couch:

  1. Despite the vast majority of the couch not being visible, the Mask R-CNN is still able to label it as such.
  2. The Mask R-CNN is correctly able to label the dog in the image.
  3. And even though my coffee cup is barely visible, Mask R-CNN is able to label the cup as well (if you look really closely you’ll see that my coffee cup is a Jurassic Park mug!)

The only part of the image that Mask R-CNN is not able to correctly label is the back part of the couch which it mistakes as a chair — looking at the image closely, you can see how Mask R-CNN made the mistake (the region does look quite chair-like versus being part of the couch).

Here’s another example of using Keras + Mask R-CNN for instance segmentation:

Figure 4: A Mask R-CNN segmented image (created with Keras, TensorFlow, and Matterport’s Mask R-CNN implementation). This picture is of me in Page, AZ.

A few years ago, my wife and I made a trip out to Page, AZ (this particular photo was taken just outside Horseshoe Bend) — you can see how the Mask R-CNN has not only detected me but also constructed a pixel-wise mask for my body.

Let’s apply Mask R-CNN to one final image:

Figure 5: Keras + Mask R-CNN with Python of a picture from Ybor City.

One of my favorite cities to visit in the United States is Ybor City — there’s just something I like about the area (and perhaps it’s that the roosters are a protected in thee city and free to roam around).

Here you can see me and such a rooster — notice how each of us is correctly labeled and segmented by the Mask R-CNN. You’ll also notice that the Mask R-CNN model was able to localize each of the individual cars and label the bus!

Can Mask R-CNN run in real-time?

At this point you’re probably wondering if it’s possible to run Keras + Mask R-CNN in real-time, right?

As you know from “The History of Mask R-CNN?” section above, Mask R-CNN is based on the Faster R-CNN object detectors.

Faster R-CNNs are incredibly computationally expensive, and when you add instance segmentation on top of object detection, the model only becomes more computationally expensive, therefore:

  • On a CPU, a Mask R-CNN cannot run in real-time.
  • But on a GPU, Mask R-CNN can get up to 5-8 FPS.

If you would like to run Mask R-CNN in semi-real-time, you will need a GPU.

How can I train a Mask R-CNN model on my own custom dataset?

Figure 6: Inside my book, Deep Learning for Computer Vision with Python, you will learn how to annotate your own training data, train your custom Mask R-CNN, and apply it to your own images. I also provide two case studies on (1) skin lesion/cancer segmentation and (2) prescription pill segmentation, a first step in pill identification.

The Mask R-CNN model we used in this tutorial was pre-trained on the COCO dataset…

…but what if you wanted to train a Mask R-CNN on your own custom dataset?

Inside my book, Deep Learning for Computer Vision with Python, I:

  1. Teach you how to train a Mask R-CNN to automatically detect and segment cancerous skin lesions — a first step in building an automatic cancer risk factor classification system.
  2. Provide you with my favorite image annotation tools, enabling you to create masks for your input images.
  3. Show you how to train a Mask R-CNN on your custom dataset.
  4. Provide you with my best practices, tips, and suggestions when training your own Mask R-CNN.

All of the Mask R-CNN chapters include a detailed explanation of both the algorithms and code, ensuring you will be able to successfully train your own Mask R-CNNs.

To learn more about my book (and grab your free set of sample chapters and table of contents), just click here.


In this tutorial, you learned how to use Keras + Mask R-CNN to perform instance segmentation.

Unlike object detection, which only gives you the bounding box (x, y)-coordinates for an object in an image, instance segmentation takes it a step further, yielding pixel-wise masks for each object.

Using instance segmentation we can actually segment an object from an image.

To perform instance segmentation we used the Matterport Keras + Mask R-CNN implementation.

We then created a Python script that:

  1. Constructed a configuration class for Mask R-CNN (both with and without a GPU).
  2. Loaded the Keras + Mask R-CNN architecture from disk
  3. Preprocessed our input image
  4. Detected objects/masks in the image
  5. Visualized the results

If you are interested in how to:

  1. Label and annotate your own custom image dataset
  2. And then train a Mask R-CNN model on top of your annotated dataset…

…then you’ll want to take a look at my book, Deep Learning for Computer Vision with Python, where I cover Mask R-CNN and annotation in detail.

I hope you enjoyed today’s post!

To download the source code (including the pre-trained Keras + Mask R- CNN model), just enter your email address in the form below! I’ll be sure to let you know when future tutorials are published here on PyImageSearch.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , ,

45 Responses to Keras Mask R-CNN

  1. abdul June 10, 2019 at 11:40 am #

    thank you

    • Adrian Rosebrock June 10, 2019 at 1:22 pm #

      You are welcome, Abdul!

  2. Tiri June 10, 2019 at 12:36 pm #

    Hi, super interesting as usual!
    I have a question: there are currently lot of changes in the libraries like keras tensorflow or pytorch, do you update the examples you do in your book, with the recent library versions?

    • Adrian Rosebrock June 10, 2019 at 1:23 pm #

      Inside my book, Dep Learning for Computer Vision with Python, I use Keras with a TensorFlow backend. The code is also compatible with TensorFlow 2.0 as well. All examples are kept up to dat with the most recent library versions.

  3. Satheesh June 10, 2019 at 1:57 pm #

    Hi, I was really waiting for this. Thanks alot brother. Also I’m expecting Yolov3 and ssd tutorials from you.

  4. ali June 10, 2019 at 8:01 pm #

    i cant install tensorflow or tensorflow-gpu
    i use pycharm
    what I do ?

    • Adrian Rosebrock June 12, 2019 at 1:35 pm #

      I would suggest you install via a terminal instead, then update your “Project Interpreter” setting in PyCharm.

  5. John June 10, 2019 at 11:50 pm #

    Hi Adrian,

    Thanks for another great tutorial. Just as a rule of thumb what are some ways to improve the FPS performance of the Mask R-CNN algorithm? I am running the code on a dedicated GPU workstation but are not seeing the FPS results you have mentioned. I am getting roughly 1 frame per second.

    • Adrian Rosebrock June 12, 2019 at 1:34 pm #

      What type of GPU are you using?

      • John June 12, 2019 at 7:47 pm #

        My local machine has an NVIDIA Quadro P2000 with 5GB of VRAM. I have also deployed the code to an AWS P3 instance equipped with a NVIDIA Tesla V100 GPU which has 16GB of VRAM. With the AWS instance the inference speedup has been pretty disappointing reaching about 1.3FPS on average.

        • Adrian Rosebrock June 13, 2019 at 9:38 am #

          Thanks John. I might try to dig into that and see what’s going on but I cannot guarantee if/when I’ll be able to do that.

          • John June 13, 2019 at 7:06 pm #

            Thanks Adrian!

          • Adrian Rosebrock June 19, 2019 at 2:22 pm #

            You are welcome!

  6. Hilman June 11, 2019 at 3:59 am #

    High-quality materials for free…
    Adrian, you are really awesome and inspired me deeply.
    I want to say thank you for that!
    Please know that your blog and your book are literally helping the world!

    • Adrian Rosebrock June 12, 2019 at 1:31 pm #

      Thanks Hilman!

  7. joao June 11, 2019 at 12:23 pm #

    Hey! I think you forgot one import: pip install imutils

    Eitherway, very easy to setup, thanks!

    • Adrian Rosebrock June 12, 2019 at 1:30 pm #

      Thanks for catching that, Joao! I have updated the post.

  8. Arjun Sreekumar June 11, 2019 at 5:11 pm #

    In which folder should all the installations be done?

    • Adrian Rosebrock June 12, 2019 at 1:29 pm #

      The folder doesn’t matter for the pip installs. Just make sure you are in the “Mask_RCNN” directory when running the “” script though.

  9. Arjun Sreekumar June 11, 2019 at 5:52 pm #

    Running the code causes an ERROR : Illegal instruction (core dumped). Could you please explain the reason? Thanks in advance

    • Adrian Rosebrock June 12, 2019 at 1:28 pm #

      What line of code is causing the error? Try using “print” statements or “pdb” to debug.

  10. Jeremy June 11, 2019 at 8:25 pm #

    As always great post Adrian. Keep up the good job. Thanks a bunch.

    • Adrian Rosebrock June 12, 2019 at 1:28 pm #

      Thanks Jeremy!

  11. fALAHGS June 12, 2019 at 12:44 am #

    Thank you Adrian for great post
    Is there a tutorial lesson to work on Custom instance Segmentation for our custom object…?

    thanks for help

  12. Ray C June 12, 2019 at 1:54 am #

    Thanks Adrian for this.
    Do you know how much GPU memory this will take for a single stream? I want to stream 4 separate streams and wondering.
    Currently, the test box I have has a 6GB GPU and using yolo3, I seem to be using at least 2 GB per camera stream of it, so no way I can run 4 streams. My requirement is to do object detection for at least 4 streams at around 10 fps at least. What do you recommend for this situation? Is there a way to use multithreading in python for gpu processing? (only one GPU)? I’m running the script multiple times, once for each stream

    also, if you;’re in Ybor City next, ping me, I’ll buy you lunch/dinner. I live in the general area!

    • Adrian Rosebrock June 12, 2019 at 1:27 pm #

      Most GPUs will only be able to handle a single Mask R-CNN model. You won’t be able to run four separate Mask R-CNN models on a single GPU. What you could do is batch process frames, like I do in this tutorial.

  13. Sovit Rath June 12, 2019 at 2:11 am #

    Hello Adrian, I am getting this error. Could you possibly help me out?

    ValueError: Layer #389 (named “mrcnn_bbox_fc”), weight has shape (1024, 364), but the saved weight has shape (1024, 324).

  14. Ramachandra Babu June 12, 2019 at 2:18 am #

    Hi can you please tell me how to perform inference only on person and ignoring other classes.

    Thanks in Advance.

    • Adrian Rosebrock June 12, 2019 at 1:26 pm #

      Simply check the “label” variable. If it’s not “person”, ignore the detection.

  15. mario June 12, 2019 at 7:48 am #

    Hi! i am getting an error just at the beggining when importing things, particulary on

    from mrcnn import model as modellib

    where i get the error “module ‘tensorflow’ has no attribute ‘name_scope'”

    I am using windows and Conda, any clue?
    Thanks in advice!

    • Adrian Rosebrock June 12, 2019 at 1:25 pm #

      What version of Keras and TensorFlow are you using?

  16. Suha Jon June 12, 2019 at 8:37 am #

    Thanks Adrian for great post
    is there tutorial about Custom segmentation ..?

  17. Oli June 12, 2019 at 11:40 am #

    Hi, this is (yet again) really interesting, thanks!
    Can this work for videos in the same way as Mask R-CNN with OpenCV?

    • Adrian Rosebrock June 12, 2019 at 1:24 pm #

      Yes, just apply the Mask R-CNN model to each frame of the video stream.

  18. Stephen Meschke June 13, 2019 at 8:30 pm #

    Another solid tutorial. I had some trouble installing the packages. My system was screwed up because I install several Python packages incorrectly. Eventually, I choose to re-install Ubuntu 18 and start fresh. Everything worked fine after that, although I had to use pip3 instead of just pip.

    I spend the majority of my PyImageSearch time trying to properly install the Python packages that you use. Installing and using various packages is critical and, unfortunately, often frustrating.

    • Adrian Rosebrock June 19, 2019 at 2:23 pm #

      Thanks Stephen — although I’m sorry you had to start with a fresh Ubuntu 18.04 install. When it comes to using different Python packages I strongly encourage using Python virtual environments so you don’t run into that issue again.

  19. ang June 14, 2019 at 1:37 pm #

    how to train my custom cnn model

  20. Ibrahim June 17, 2019 at 5:22 am #

    In my case it just shows an image, no detection and segmentation. Moreover I get this warning:
    RuntimeWarning: divide by zero encountered in divide

    • Adrian Rosebrock June 19, 2019 at 2:04 pm #

      What version of Keras and TensorFlow are you using? And what function call throws that error?

  21. Andrew Welham June 17, 2019 at 9:37 am #

    Great blog entry again, This may be a silly question, but if I wanted to only detect people, can i remove other categorisations from Mask R-CNN to speed it up? Or would it make no difference ?

    • Adrian Rosebrock June 19, 2019 at 2:03 pm #

      In terms of speed, no, it would make no difference. That said, if you want to detect only people, just loop over the detections and ignore anything that is not a “person” class.

Leave a Reply