(Faster) Non-Maximum Suppression in Python

Fast Non-Maximum Suppression for Object Detection in Images

I have issues — I can’t stop thinking about object detection.

You see, last night I was watching The Walking Dead and instead of enjoying the zombie brutality, the forced cannibalism, or the enthralling storyline, all I wanted to do was build an object detection system to recognize zombies.

Would it be very useful? Probably not.

I mean, it’s quite obvious if a zombie is coming after you. The stench alone should be a dead (hey, look at that pun) giveaway, let alone the gnashing of teeth and outstretched arms. And we might as well throw in guttural “brainnnnsss” moans as well.

Like I said, it would be pretty obvious if a zombie was after you — you certainly wouldn’t need a computer vision system to tell you that. But this is just an example of the type of stuff that runs through my head on a day-to-day basis.

To give you some context, two weeks ago I posted about using Histogram of Oriented Gradients and a Linear Support Vector Machine to build an object detection system. Sick of using the OpenCV Haar cascades and obtaining poor performance, and not to mention, extremely long training times, I took it upon myself to write my own Python object detection framework.

So far, it’s gone extremely well and it’s been a lot of fun to implement.

But there’s an unescapable issue you must handle when building an object detection system — overlapping bounding boxes. It’s going to happen, there’s no way around it. In fact, it’s actually a good sign that your object detector is firing properly so I wouldn’t even call it an “issue” exactly.

To handle the removal overlapping bounding boxes (that refer to the same object) we can either use non-maximum suppression on the Mean-Shift algorithm. While Dalal and Triggs prefer Mean-Shift, I find Mean-Shift to give sub-par results.

After receiving advice from my friend (and expert on object detection), Dr. Tomasz Malisiewicz, I decided to port his non-maximum suppression MATLAB implementations over to Python.

Last week I showed you how to implement the Felzenszwalb et al. method. And this week I am going to show you the Malisiewicz et al. method which is over 100x faster.

Note: I intended on publishing this blog post way back in November, but due to some poor planning on my part, it’s taken awhile for me to get this post out the door. Anyway, it’s online now!

So where does the speedup come from? And how do we obtain such faster suppression times?

Read on to find out.

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.

(Faster) Non-Maximum Suppression in Python

Before we get started, if you haven’t read last week’s post on non-maximum suppression, I would definitely start there.

Otherwise, open up a new file in your favorite editor, name it nms.py , and let’s get started on creating a faster non-maximum suppression implementation:

Take a second to really examine this code and compare it to the Felzenszwalb et al. implementation of last week.

The code looks almost identical, right?

So you’re probably asking yourself “Where does this 100x speed-up come from?”

And the answer is the removal of an inner for  loop.

The implementation from last week required an extra inner for  loop to compute the size of bounding regions and compute the ratio of overlapped area.

Instead, Dr. Malisiewicz replaced this inner for  loop with vectorized code — and this is how we are able to achieve substantially faster speeds when applying non-maximum suppression.

Instead of repeating myself and going line-by-line through the code like I did last week, let’s just examine the important parts.

Lines 6-22 of our faster non-maximum suppression function are essentially the same as they are from last week. We start by grabbing the (x, y) coordinates of the bounding boxes, computing their area, and sorting the indexes into the boxes  list according to their bottom-right y-coordinate of each box.

Lines 31-55 contain our speed-up, where Lines 41-55 are especially important.

Instead of using an inner for  loop to loop over each of the individual boxes, we instead vectorize the code using the np.maximum  and np.minimum  functions — this allows us to find the maximum and minimum values across the axis rather than just individual scalars.

Note: You have to use the np.maximum  and np.minimum  functions here — they allow you to mix scalars and vectors. The np.max  and np.min  functions do not, and if you use them, you will find yourself with some pretty fierce bugs to find and fix. It took me quite awhile to debug this problem when I was porting the algorithm from MATLAB to Python.

Lines 47 and 48 are also vectorized — here we compute the width and height of each of the rectangles to check. Similarly, computing the overlap  ratio on Line 51 is also vectorized. From there, we just delete all entries from our idx  list that are greater than our supplied overlap threshold. Typical values for the overlap threshold normally fall in the range 0.3-0.5.

The Malisiewicz et al. method is essentially identical to the Felzenszwalb et al. method, but by using vectorized code we are able to reach a reported 100x speed-up in non-maximum suppression!

Faster Non-Maximum Suppression in Action

Let’s go ahead and explore a few examples. We’ll start with the creepy little girl zombie that we saw at the top of this image:

Figure 1: Detecting three bounding boxes in the image, but non-maximum suppression suppresses two of the overlapping boxes.

Figure 1: Detecting three bounding boxes in the image, but non-maximum suppression suppresses two of the overlapping boxes.

It’s actually really funny how well our face detector trained on real, healthy human faces generalizes to zombie faces as well. Granted, they are still “human” faces, but with all the blood and disfigurement, I wouldn’t be surprised to see some strange results.

Figure 2: It looks like our face detector didn't generalize as well. Clearly the teeth of this zombie looks like a face to the detector.

Figure 2: It looks like our face detector didn’t generalize as well — the teeth of this zombie looks like a face to the detector.

Speaking of strange results, it looks like our face detector detected the mouth/teeth region of the zombie on the right. Perhaps if I had explicitly trained the HOG + Linear SVM face detector on zombie images the results would be better.

Figure 3: Six bounding boxes are detected around the face, but by applying fast non-maximum suppression we are correctly able to reduce the number of bounding boxes to one.

Figure 3: Six bounding boxes are detected around the face, but by applying fast non-maximum suppression we are correctly able to reduce the number of bounding boxes to one.

In this last example we can again see that our non-maximum suppression algorithm is working correctly — even though six original bounding boxes were detected by the HOG + Linear SVM detector, applying non-maximum suppression has correctly suppressed five of the boxes, leaving us with the final detection.

Summary

In this blog post we reviewed the Malisiewicz et al. method for non-maximum suppression.

This method is nearly identical to the Felzenszwalb et al. method, but by removing an inner for  loop and using vectorized code we are able to obtain substantial speed-ups.

And if you have a second, be sure to thank Dr. Tomasz Malisiewicz!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , ,

46 Responses to (Faster) Non-Maximum Suppression in Python

  1. Mau August 7, 2015 at 9:56 am #

    Sometimes, with this method I get this warning:
    invalid value encountered in divide overlap = (w * h) / area[idxs[:last]]

    Do you know what can be? I’m testing it with Hog for pedestrian people

    • Adrian Rosebrock August 8, 2015 at 6:34 am #

      Hey Mau — I’m not sure why that would happen, I have not encountered that problem before. It sounds like it might be a divide by zero error. Check the area array and see if there are any bounding boxes that somehow have zero area.

    • venkad April 10, 2017 at 3:04 pm #

      it is work fine thank you

  2. Bosmart September 14, 2015 at 12:55 pm #

    So it’s essentially a vectorized implementation of the original algorithm?

    • Adrian Rosebrock September 15, 2015 at 6:02 am #

      Yep, that’s absolutely correct! The vectorization is what leads to such a dramatic speedup.

  3. ngapweitham November 18, 2015 at 12:17 pm #

    I wonder why opencv do not provide this function, may need to implement a c++ version if I can not find a decent one

  4. Phong March 15, 2016 at 6:45 am #

    Thank for the post. However I wonder nms is more correct if bounding boxes’ scores are provided? And how to do that?

    • Adrian Rosebrock March 15, 2016 at 4:31 pm #

      Yes, you can absolutely perform NMS using bounding box probabilities. Just supply it as an extra argument to the function and then apply argsort to the probabilities rather than the coordinates.

  5. Farah June 13, 2016 at 12:56 am #

    How to apply non maximum suppression if we have more than one objects in the image

    • Adrian Rosebrock June 15, 2016 at 12:51 pm #

      NMS automatically handles multiple objects in an image, provided that the objects to not overlap more than the overlap threshold.

  6. Corey Nolet November 3, 2016 at 6:06 pm #

    Hmm. I’ve implemented your algorithm on top of a set of over 122k bounding boxes and I’m getting some very strange results.

    First, the more I raise the percentage, the less bounding boxes I end up with in the end. You would think the higher percentage lowers the supression and therefore would raise the amount of resulting bounding boxes but I’m not finding this to be the case at all.

    Second, the bounding boxes were generated using a few different algorithms and the results I’m getting seem very small- for 122k bounding boxes, I’m getting numbers in the low double digits to single digits.

    I tried pre-sorting the bounding boxes by area before passing them into the algorithm. Strangely enough, sorting by reverse order and regular ordering seem to yield different results.

    This algorithm is fast, indeed, but I question the correctness. I would love to use it though- it is certainly several orders of magnitude faster than having to use nested for loops.

    • Adrian Rosebrock November 4, 2016 at 10:04 am #

      In this implementation the bounding boxes are sorted by their size (area). However, if you have the probabilities associated with each bounding box that is much more preferable. In that case you would sort the bounding boxes according to their confidence (largest to smallest). This would require modifying the actual function.

      • Corey Nolet November 7, 2016 at 4:35 pm #

        Maybe I’m missing something. I see the bounding boxes sorted by their y2 component. My bounding boxes are the X1, Y1, X2, Y2 in euclidean space. This to me would imply that the bounding boxes are being sorted only by their y2 value in euclidean space unless there’s something going on outside of the scope of this source code which is sorting them by their actual area?

        The only other thing I can think of is that maybe X2 and Y2 are really width and height and not the euclidean coordinates? That might make more sense but it still emans that the bounding boxes are only being sorted by their heights.

        I am a java developer that moved to python only recently and specifically to numpy so it’s possible I am missing something here that’s throwing off my understnading. I would love to get this working because it is worlds faster. I do have probabilities associated with my bounding boxes but I ran three different algorithms to propose the boxes and they are all going to contain three different probabilities. It seems like trying to sort by those probabilities wouldn’t be the best idea across algorithms. I suppose I can run the NMS separately for each set of bounding boxes from each algorithm.

        • Adrian Rosebrock November 10, 2016 at 8:53 am #

          Correct, the bounding boxes are being sorted by their y2 coordinate. This implementation makes the assumption that you do not have the probabilities associated with each bounding box. If you don’t have the probabilities associated with each bounding box then you need to sort on a coordinate.

          In your case since you have the probabilities I would modify this function to accept the list of associated probabilities and then sort them. This will suppress bounding boxes with low probabilities that overlap with bounding boxes that have higher predicted probabilities.

          • mooj December 12, 2016 at 3:24 pm #

            Yeah I also think that this implementation of nms is flawed. ie: if I’m understand this right, it builds in a systematic error which lends preference for bounding boxes near the bottom right corner of an image. Am I right about the systematic error, or did I misunderstand something in the way the algorithm works? If so, there’s definitely a better way to do this.

            That said, thank you once again for your work, its well-written and nicely expository as always 🙂

          • Adrian Rosebrock December 14, 2016 at 8:45 am #

            As I mentioned in previous comments, if you don’t have a probability/confidence associated with a bounding box you need to define some method to sort on. In this case it’s arbitrarily the lower-right corner of the bounding boxes. If you instead have the probabilities associated with each prediction you should instead sort on these (with higher probabilities at the front of the list).

          • Walid January 31, 2017 at 4:44 pm #

            Thanks a lot Adrian.
            I used probability and it dramatically increased precision

  7. Zeyu February 21, 2017 at 11:14 am #

    Hi Adrian, I bought your Practical Python and OpenCV bundle 3 months ago, and I have already finished them all. The case studies are great, but I just noticed that almost every detection project in case study is implemented through opencv haar cascade classifier. I was wondering how I can write my own object detection framework, like you have mentioned in this blog. Is there any tutorials or courses for this?

    • Adrian Rosebrock February 22, 2017 at 1:35 pm #

      Hi Zeyu — thank you for picking up a copy of Practical Python and OpenCV. This book is meant to take you the fundamentals of computer vision and image processing using the OpenCV library. If you’re interested in more advanced object detection, be sure to take a look at the PyImageSearch Gurus course where I demonstrate how to implement the HOG + Linear SVM framework from scratch.

  8. Kevin March 13, 2017 at 5:46 am #

    What is the purpose of adding 1 when calculating the area and the width and height? Wouldn’t that make the calculation of the area and width inaccurate? e.g. say we have a square where the top left and bottom right coordinates of the corners were (2,2) and (4,4), then we know that the area of the square is 4. Using area = (x2-x1+1) * (y2-y1+1) would give an area of 9. Am I missing something here?

    • Adrian Rosebrock March 13, 2017 at 12:09 pm #

      A value of “1” is just a constant — it doesn’t change the calculation. It simply ensures the numerator is not zero.

  9. Bernardo Ruga April 3, 2017 at 4:56 pm #

    If i do have the probability of the boxes, the only thing i would have to change in the code would be the argsort to scores, and nothing else? Your previous replies didn’t quite cleared my doubt, thanks!

    • Adrian Rosebrock April 5, 2017 at 12:03 pm #

      Correct, sort the probabilities based on argsort in descending order (higher probabilities at the front of the list).

      • Zantsu September 11, 2017 at 5:26 pm #

        I might be completely wrong here, but, shouldn’t the probabilities be sorted in ascending order? As far as I understood this algorithm, it has bias towards the box with the largest y coordinate, which is last in the last, so in case of probabilities being used, shouldn’t the box with the highest probability be at the end of the list?

        • Adrian Rosebrock September 12, 2017 at 8:24 pm #

          This implementation assumes the probabilities are unknown so it arbitrarily sorts on the bottom y-coordinate of the bounding box. If you knew the probabilities associated with each bounding box you would want to place higher probabilities at the front of the list so the smaller probabilities are suppressed. Take a look at the updated implementation here.

          • Eric Fischer November 19, 2018 at 3:54 pm #

            In the implementation you linked, if I’m not mistaken the probabilities based on argsort are sorted in ASCENDING order (idxs = np.argsort(probs). Adrian, can you clear up the confusion on why you recommend descending order? Am I missing something–is this implementation actually descending order in the way you’re thinking about it? Because as Zantsu correctly said, the algorithm has bias towards what is last in the last, so ascending order makes sense to me as it did to Zantsu.

  10. Nick May 9, 2017 at 2:57 pm #

    Hi Adrian,

    Thank you for your interesting posts on nms, I learned a lot from them.

    I did find some peculiarities in your code. Do you mind clarifying them?

    Your text states
    ‘It is absolutely critical that we sort according to the bottom-right corner as we’ll need to compute the overlap ratio of other bounding boxes later in this function’
    Since you’re sorting on y2, I deduct that the coordinate pair (x2,y2) is the bottom right and the pair (x1,y1) the top left.
    This means that y2 is the minimal y of a certain box and x2 the maximal x of a certain box. (y2 = Ymin, x2 = Xmax)
    This results in y1 being the maximal y, x1 being the minimal X. (y1 = Ymax, x1 = Xmin)
    Since Ymin <= Ymax, the calculated area on line 26 (Xmax – Xmin)*(Ymin – Ymax) will be <= 0.
    Adding a constant 1 doesn't prevent the second factor and thus the area from being 0.

    When sorting on y2 with argssort (line 27) and taking the last one (line 34), I think it will favour the highest bottom right corner since argsort sorts ascending.
    Sorting on probs with argssort and taking the last one works correctly, as I see it, as it favours the highest prob.

    On lines 38-44, you select the largest coordinates for the start of the bounding box and the smallest for the end.
    I assume the start is the Top Left and the end is the Bottom Right.
    Then if you want the largest box, I'd take the minimum of x1 (left most of top left) and maximum of y1 (highest of top left) for the top left
    and the maximum of x2 (rightmost of bottom right) and minimum of y2 (lowest of bottom right).

    On lines 46-48 you calculate width and height the same way as you did for the area on line 26.

    From my point of view, the overlap calculation on line 51 can result in a divide by zero when y2 = y1 – 1 on line 26.

    Is there a flaw in my reasoning?

    • Frank June 22, 2017 at 9:11 am #

      Hi Nick,

      Even if I didn’t work much with OpenCV (just starting in fact), I’m pretty sure, that given a picture, of let’s say 800x600px, (0/0) is in the upper left corner and (799/599) is the lower right. Like on every computer screen. So y2 is the maximum of every box. So all you’re following conclusions are wrong.
      And since Adrian is adding a positive constant to both factors and the coordinates are zero or positive, there’s no way to get an area of size zero. Nevertheless, I would change it to area = (x2 – x1) * (y2 – y1) + 1 to reduce the calculation steps by one.

      By the way, from a complexity theory point of view, this algorithm and the Felszenszwalb have the same time complexity. The speedup comes from the optimized implementation of numpy. At the end of the day, numpy uses loops for the vector calculations, so on another platform or another python implementation (cython, ironpython, etc. pp) there might be no difference.

      If I understand the algorithm right, it’s a simple line sweep. I will take a closer look if I find the time. Maybe you can gain a speedup if you build a B-Tree or a similiar structure instead of sorting and doing all the calculations. It might be even better if you already get the boxes in a Tree from the Detector instead of a simple list.

      • Adrian Rosebrock June 22, 2017 at 9:23 am #

        You are correct Frank — the speedup comes from the highly optimized vector operations of NumPy.

  11. jsa October 24, 2017 at 5:45 am #

    Another twist to the algorithm would be to have both a score and a segmentation mask for each bounding box. Even though it’s easy to compute the overlap between boolean masks with a bitwise and, I’m still trying to figure out a fast way to do it.

    Regards

  12. Raphael Jakob February 5, 2018 at 6:36 pm #

    Is it also possible to do that in 3D ? 🙂

  13. Alan Kia February 6, 2018 at 3:01 am #

    Hi Adrian, a question here.

    How can i edit the code so that it can accommodate array with more than 4 values ?
    I’m getting error :

    for (startX, startY, endX, endY) in boundingBoxes:

    ValueError: too many values to unpack (expected 4)

    • Adrian Rosebrock February 6, 2018 at 10:04 am #

      You’ll need to create variables in the parenthesis for each of the items in the array.

  14. Jason February 25, 2018 at 4:41 am #

    I noticed one issue when testing this out on some pedestrian detection: sometimes I ended up with >1 overlapping boxes for the same pedestrian detected after applying the NMS, and I think this is because the areal overlapping ratio is computed as

    ratio = area of intersection / area of the checked box

    It could happen that the reference box (choosing the from the largest y2 coordinate as in this implementation) is relatively small, and it resides entirely or almost entirely inside a larger box, as long as the area_small / area_large <= threshold, it will pass the checking, and you end up with overlapping boxes.

    This won't be a problem if the sorting is on area instead of y2. Or use the numpy.minimum(area[i], area[idxs[:last]]) as the denominator in the ratio computation, if one chooses to sort by the probability or something.

    And another question I have: the resultant bounding box is not always that accurate, is there any easy hack on the currently NMS that could help improve this?

    • Adrian Rosebrock February 25, 2018 at 8:03 am #

      Do you have the probability of the prediction associated with the bounding box? If so, you can sort on the probability instead of an arbitrary coordinator (some models do not provide a probability and I did not want to assume one for this post).

      That said, I have an updated NMS algorithm inside my imutils library that might help you out here.

      To answer your second question: NMS cannot “improve” the bounding boxes generated by your detector. Your detector should be responsible for producing quality bounding boxes. NMS will then suppress overlapping ones.

  15. Laxminarayan Kamath March 14, 2018 at 2:03 am #

    When I look at the bounding boxes, I wondered: How effecient would it be to simply find the maxmimums of all topleft_x s and minimums of all bottomright_x s (and likewise for the Ys)

    Are there any disadvantages?

    • Laxminarayan Kamath March 16, 2018 at 2:29 am #

      [facepalm] Of course, works only when you are expecting one face in the image.

  16. Bhargav Kanakiya April 6, 2018 at 7:57 pm #

    Hi,

    This is a great article! Recently, I came across a paper which discusses an improvement to the NMS and is also published by your Alma Mater. Maybe you can also update their explanation in a simpler manner?

    Check it out: https://arxiv.org/pdf/1704.04503.pdf

    They also have the python code available.

    • Adrian Rosebrock April 10, 2018 at 12:46 pm #

      Thanks for sharing, Bhargav!

  17. Vishal April 10, 2018 at 8:52 pm #

    Hi Adrian,

    Thank you for the post. Can I implement this for multiple objects template matching? I have a code which for lower threshold for Correlation coefficient detects multiple overlapping bounding boxes until the whole section is colored (painted) with the rectangles. Can I use NMS for this problem? Could you point me in the right direction?

    Best,
    Vishal

    • Adrian Rosebrock April 11, 2018 at 9:05 am #

      This method works for multiple objects and multiple class labels (you would just run NMS on each class label). It sounds like you are getting way to many bounding boxes to begin with so you may want to consider methods to prune them down based on correlation value thresholding.

      • Vishal April 11, 2018 at 4:35 pm #

        In my case I am returning bounding boxes of same size, how would it work in that case? Also can I sort based on correlation coefficient value instead of bottom right y coordinate, would you recommend it? I am sorry for the trivial question.

        Best,
        Vishal

      • Vishal April 12, 2018 at 6:11 pm #

        But for multiple detections on template matching the w and h for all the bounding boxes would be same, i.e., based on template size. How would I incorporate this algorithm in that case?

        • Adrian Rosebrock April 13, 2018 at 6:41 am #

          You would apply NMS based on the confidence of the template matching. See my implementation of NMS in imutils (take a look at the “probs” variable).

  18. anand April 20, 2018 at 5:45 pm #

    Hello,
    Thanks for a great article, it was very useful on how NMS works.

    One thing, I tried to time the fast and slow method but for some reason in my machine don’t see a 10x machine.

    Here is the the gist – https://gist.github.com/PREM1980/93ec1298bea0495feaae77c798a345f0

    I see minor differences. Not sure if Im doing something wrong.

    • Adrian Rosebrock April 25, 2018 at 6:20 am #

      How many bounding boxes do you have for each image?

Leave a Reply