I have issues — *I can’t stop thinking about object detection*.

You see, last night I was watching *The Walking Dead* and instead of enjoying the zombie brutality, the forced cannibalism, or the enthralling storyline, *all I wanted to do was build an object detection system to recognize zombies.*

Would it be very useful? Probably not.

I mean, it’s quite obvious if a zombie is coming after you. The stench alone should be a ** dead** (hey, look at that pun) giveaway, let alone the gnashing of teeth and outstretched arms. And we might as well throw in guttural

*“brainnnnsss”*moans as well.

Like I said, it would be pretty obvious if a zombie was after you — you certainly wouldn’t need a computer vision system to tell you that. But this is just an example of the type of stuff that runs through my head on a day-to-day basis.

To give you some context, two weeks ago I posted about using Histogram of Oriented Gradients and a Linear Support Vector Machine to build an object detection system. Sick of using the OpenCV Haar cascades and obtaining poor performance, and not to mention, extremely long training times, I took it upon myself to write my own Python object detection framework.

So far, it’s gone extremely well and it’s been a lot of fun to implement.

But there’s *an unescapable issue* you must handle when building an object detection system — *overlapping bounding boxes*. It’s going to happen, there’s no way around it. In fact, it’s actually a good sign that your object detector is firing properly so I wouldn’t even call it an “issue” exactly.

To handle the removal overlapping bounding boxes (that refer to the same object) we can either use non-maximum suppression on the Mean-Shift algorithm. While Dalal and Triggs prefer Mean-Shift, I find Mean-Shift to give sub-par results.

After receiving advice from my friend (and expert on object detection), Dr. Tomasz Malisiewicz, I decided to port his non-maximum suppression MATLAB implementations over to Python.

Last week I showed you how to implement the Felzenszwalb et al. method. And this week I am going to show you the Malisiewicz et al. method which is over *100x faster*.

**Note:** I intended on publishing this blog post way back in November, but due to some poor planning on my part, it’s taken awhile for me to get this post out the door. Anyway, it’s online now!

So where does the speedup come from? And how do we obtain such faster suppression times?

Read on to find out.

Looking for the source code to this post?

Jump right to the downloads section.

**OpenCV and Python versions:**

This example will run on** Python 2.7/Python 3.4+** and **OpenCV 2.4.X/OpenCV 3.0+**.

# (Faster) Non-Maximum Suppression in Python

Before we get started, if you haven’t read last week’s post on non-maximum suppression, I would definitely start there.

Otherwise, open up a new file in your favorite editor, name it nms.py , and let’s get started on creating a faster non-maximum suppression implementation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# import the necessary packages import numpy as np # Malisiewicz et al. def non_max_suppression_fast(boxes, overlapThresh): # if there are no boxes, return an empty list if len(boxes) == 0: return [] # if the bounding boxes integers, convert them to floats -- # this is important since we'll be doing a bunch of divisions if boxes.dtype.kind == "i": boxes = boxes.astype("float") # initialize the list of picked indexes pick = [] # grab the coordinates of the bounding boxes x1 = boxes[:,0] y1 = boxes[:,1] x2 = boxes[:,2] y2 = boxes[:,3] # compute the area of the bounding boxes and sort the bounding # boxes by the bottom-right y-coordinate of the bounding box area = (x2 - x1 + 1) * (y2 - y1 + 1) idxs = np.argsort(y2) # keep looping while some indexes still remain in the indexes # list while len(idxs) > 0: # grab the last index in the indexes list and add the # index value to the list of picked indexes last = len(idxs) - 1 i = idxs[last] pick.append(i) # find the largest (x, y) coordinates for the start of # the bounding box and the smallest (x, y) coordinates # for the end of the bounding box xx1 = np.maximum(x1[i], x1[idxs[:last]]) yy1 = np.maximum(y1[i], y1[idxs[:last]]) xx2 = np.minimum(x2[i], x2[idxs[:last]]) yy2 = np.minimum(y2[i], y2[idxs[:last]]) # compute the width and height of the bounding box w = np.maximum(0, xx2 - xx1 + 1) h = np.maximum(0, yy2 - yy1 + 1) # compute the ratio of overlap overlap = (w * h) / area[idxs[:last]] # delete all indexes from the index list that have idxs = np.delete(idxs, np.concatenate(([last], np.where(overlap > overlapThresh)[0]))) # return only the bounding boxes that were picked using the # integer data type return boxes[pick].astype("int") |

Take a second to really examine this code and compare it to the Felzenszwalb et al. implementation of last week.

The code looks ** almost identical**, right?

So you’re probably asking yourself *“Where does this 100x speed-up come from?”*

**And the answer is the removal of an inner
for loop.**

The implementation from last week required an extra inner for loop to compute the size of bounding regions and compute the ratio of overlapped area.

Instead, Dr. Malisiewicz replaced this inner
for loop with vectorized code — *and this is how we are able to achieve substantially faster speeds when applying non-maximum suppression.*

Instead of repeating myself and going line-by-line through the code like I did last week, let’s just examine the important parts.

**Lines 6-22** of our faster non-maximum suppression function are essentially the same as they are from last week. We start by grabbing the *(x, y)* coordinates of the bounding boxes, computing their area, and sorting the indexes into the
boxes list according to their bottom-right y-coordinate of each box.

**Lines 31-55 **contain our speed-up, where **Lines 41-55** are especially important.

Instead of using an inner
for loop to loop over each of the individual boxes, we instead vectorize the code using the
np.maximum and
np.minimum functions — this allows us to find the maximum and minimum values across the *axis* rather than just *individual scalars*.

**Note:** You *have** *to use the
np.maximum and
np.minimum functions here — they allow you to mix scalars and vectors. The
np.max and
np.min functions do not, and if you use them, you will find yourself with some pretty fierce bugs to find and fix. It took me quite awhile to debug this problem when I was porting the algorithm from MATLAB to Python.

**Lines 47 and 48** are also vectorized — here we compute the width and height of each of the rectangles to check. Similarly, computing the
overlap ratio on **Line 51 **is also vectorized. From there, we just delete all entries from our
idx list that are greater than our supplied overlap threshold. Typical values for the overlap threshold normally fall in the range 0.3-0.5.

The Malisiewicz et al. method is essentially identical to the Felzenszwalb et al. method, but by using vectorized code we are able to reach a reported 100x speed-up in non-maximum suppression!

# Faster Non-Maximum Suppression in Action

Let’s go ahead and explore a few examples. We’ll start with the creepy little girl zombie that we saw at the top of this image:

It’s actually really funny how well our face detector trained on real, healthy human faces generalizes to zombie faces as well. Granted, they are still “human” faces, but with all the blood and disfigurement, I wouldn’t be surprised to see some strange results.

Speaking of strange results, it looks like our face detector detected the mouth/teeth region of the zombie on the right. Perhaps if I had explicitly trained the HOG + Linear SVM face detector on zombie images the results would be better.

In this last example we can again see that our non-maximum suppression algorithm is working correctly — even though **six** original bounding boxes were detected by the HOG + Linear SVM detector, applying non-maximum suppression has correctly suppressed five of the boxes, leaving us with the final detection.

# Summary

In this blog post we reviewed the Malisiewicz et al. method for non-maximum suppression.

This method is nearly identical to the Felzenszwalb et al. method, but by removing an inner for loop and using vectorized code we are able to obtain substantial speed-ups.

And if you have a second, be sure to thank Dr. Tomasz Malisiewicz!

Sometimes, with this method I get this warning:

invalid value encountered in divide overlap = (w * h) / area[idxs[:last]]

Do you know what can be? I’m testing it with Hog for pedestrian people

Hey Mau — I’m not sure why that would happen, I have not encountered that problem before. It sounds like it might be a divide by zero error. Check the

`area`

array and see if there are any bounding boxes that somehow have zero area.it is work fine thank you

So it’s essentially a vectorized implementation of the original algorithm?

Yep, that’s absolutely correct! The vectorization is what leads to such a dramatic speedup.

I wonder why opencv do not provide this function, may need to implement a c++ version if I can not find a decent one

Thank for the post. However I wonder nms is more correct if bounding boxes’ scores are provided? And how to do that?

Yes, you can absolutely perform NMS using bounding box probabilities. Just supply it as an extra argument to the function and then apply

`argsort`

to the probabilities rather than the coordinates.How to apply non maximum suppression if we have more than one objects in the image

NMS automatically handles multiple objects in an image, provided that the objects to not overlap more than the overlap threshold.

Hmm. I’ve implemented your algorithm on top of a set of over 122k bounding boxes and I’m getting some very strange results.

First, the more I raise the percentage, the less bounding boxes I end up with in the end. You would think the higher percentage lowers the supression and therefore would raise the amount of resulting bounding boxes but I’m not finding this to be the case at all.

Second, the bounding boxes were generated using a few different algorithms and the results I’m getting seem very small- for 122k bounding boxes, I’m getting numbers in the low double digits to single digits.

I tried pre-sorting the bounding boxes by area before passing them into the algorithm. Strangely enough, sorting by reverse order and regular ordering seem to yield different results.

This algorithm is fast, indeed, but I question the correctness. I would love to use it though- it is certainly several orders of magnitude faster than having to use nested for loops.

In this implementation the bounding boxes are sorted by their size (area). However, if you have the

probabilitiesassociated with each bounding box that is much more preferable. In that case you would sort the bounding boxes according to their confidence (largest to smallest). This would require modifying the actual function.Maybe I’m missing something. I see the bounding boxes sorted by their y2 component. My bounding boxes are the X1, Y1, X2, Y2 in euclidean space. This to me would imply that the bounding boxes are being sorted only by their y2 value in euclidean space unless there’s something going on outside of the scope of this source code which is sorting them by their actual area?

The only other thing I can think of is that maybe X2 and Y2 are really width and height and not the euclidean coordinates? That might make more sense but it still emans that the bounding boxes are only being sorted by their heights.

I am a java developer that moved to python only recently and specifically to numpy so it’s possible I am missing something here that’s throwing off my understnading. I would love to get this working because it is worlds faster. I do have probabilities associated with my bounding boxes but I ran three different algorithms to propose the boxes and they are all going to contain three different probabilities. It seems like trying to sort by those probabilities wouldn’t be the best idea across algorithms. I suppose I can run the NMS separately for each set of bounding boxes from each algorithm.

Correct, the bounding boxes are being sorted by their y2 coordinate. This implementation makes the assumption that you

do nothave the probabilities associated with each bounding box. If you don’t have the probabilities associated with each bounding box then you need to sort on a coordinate.In your case since you have the probabilities I would modify this function to accept the list of associated probabilities and then sort them. This will suppress bounding boxes with low probabilities that overlap with bounding boxes that have higher predicted probabilities.

Yeah I also think that this implementation of nms is flawed. ie: if I’m understand this right, it builds in a systematic error which lends preference for bounding boxes near the bottom right corner of an image. Am I right about the systematic error, or did I misunderstand something in the way the algorithm works? If so, there’s definitely a better way to do this.

That said, thank you once again for your work, its well-written and nicely expository as always 🙂

As I mentioned in previous comments, if you don’t have a probability/confidence associated with a bounding box you need to define some method to sort on. In this case it’s arbitrarily the lower-right corner of the bounding boxes. If you instead have the probabilities associated with each prediction you should instead sort on these (with higher probabilities at the front of the list).

Thanks a lot Adrian.

I used probability and it dramatically increased precision

Hi Adrian, I bought your Practical Python and OpenCV bundle 3 months ago, and I have already finished them all. The case studies are great, but I just noticed that almost every detection project in case study is implemented through opencv haar cascade classifier. I was wondering how I can write my own object detection framework, like you have mentioned in this blog. Is there any tutorials or courses for this?

Hi Zeyu — thank you for picking up a copy of Practical Python and OpenCV. This book is meant to take you the fundamentals of computer vision and image processing using the OpenCV library. If you’re interested in more advanced object detection, be sure to take a look at the PyImageSearch Gurus course where I demonstrate how to implement the HOG + Linear SVM framework from scratch.

What is the purpose of adding 1 when calculating the area and the width and height? Wouldn’t that make the calculation of the area and width inaccurate? e.g. say we have a square where the top left and bottom right coordinates of the corners were (2,2) and (4,4), then we know that the area of the square is 4. Using area = (x2-x1+1) * (y2-y1+1) would give an area of 9. Am I missing something here?

A value of “1” is just a constant — it doesn’t change the calculation. It simply ensures the numerator is not zero.

If i do have the probability of the boxes, the only thing i would have to change in the code would be the argsort to scores, and nothing else? Your previous replies didn’t quite cleared my doubt, thanks!

Correct, sort the probabilities based on argsort in descending order (higher probabilities at the front of the list).

I might be completely wrong here, but, shouldn’t the probabilities be sorted in ascending order? As far as I understood this algorithm, it has bias towards the box with the largest y coordinate, which is last in the last, so in case of probabilities being used, shouldn’t the box with the highest probability be at the end of the list?

This implementation assumes the probabilities are unknown so it arbitrarily sorts on the bottom y-coordinate of the bounding box. If you knew the probabilities associated with each bounding box you would want to place higher probabilities at the front of the list so the smaller probabilities are suppressed. Take a look at the updated implementation here.

In the implementation you linked, if I’m not mistaken the probabilities based on argsort are sorted in ASCENDING order (idxs = np.argsort(probs). Adrian, can you clear up the confusion on why you recommend descending order? Am I missing something–is this implementation actually descending order in the way you’re thinking about it? Because as Zantsu correctly said, the algorithm has bias towards what is last in the last, so ascending order makes sense to me as it did to Zantsu.

Hi Adrian,

Thank you for your interesting posts on nms, I learned a lot from them.

I did find some peculiarities in your code. Do you mind clarifying them?

Your text states

‘It is absolutely critical that we sort according to the bottom-right corner as we’ll need to compute the overlap ratio of other bounding boxes later in this function’

Since you’re sorting on y2, I deduct that the coordinate pair (x2,y2) is the bottom right and the pair (x1,y1) the top left.

This means that y2 is the minimal y of a certain box and x2 the maximal x of a certain box. (y2 = Ymin, x2 = Xmax)

This results in y1 being the maximal y, x1 being the minimal X. (y1 = Ymax, x1 = Xmin)

Since Ymin <= Ymax, the calculated area on line 26 (Xmax – Xmin)*(Ymin – Ymax) will be <= 0.

Adding a constant 1 doesn't prevent the second factor and thus the area from being 0.

When sorting on y2 with argssort (line 27) and taking the last one (line 34), I think it will favour the highest bottom right corner since argsort sorts ascending.

Sorting on probs with argssort and taking the last one works correctly, as I see it, as it favours the highest prob.

On lines 38-44, you select the largest coordinates for the start of the bounding box and the smallest for the end.

I assume the start is the Top Left and the end is the Bottom Right.

Then if you want the largest box, I'd take the minimum of x1 (left most of top left) and maximum of y1 (highest of top left) for the top left

and the maximum of x2 (rightmost of bottom right) and minimum of y2 (lowest of bottom right).

On lines 46-48 you calculate width and height the same way as you did for the area on line 26.

From my point of view, the overlap calculation on line 51 can result in a divide by zero when y2 = y1 – 1 on line 26.

Is there a flaw in my reasoning?

Hi Nick,

Even if I didn’t work much with OpenCV (just starting in fact), I’m pretty sure, that given a picture, of let’s say 800x600px, (0/0) is in the upper left corner and (799/599) is the lower right. Like on every computer screen. So y2 is the maximum of every box. So all you’re following conclusions are wrong.

And since Adrian is adding a positive constant to both factors and the coordinates are zero or positive, there’s no way to get an area of size zero. Nevertheless, I would change it to area = (x2 – x1) * (y2 – y1) + 1 to reduce the calculation steps by one.

By the way, from a complexity theory point of view, this algorithm and the Felszenszwalb have the same time complexity. The speedup comes from the optimized implementation of numpy. At the end of the day, numpy uses loops for the vector calculations, so on another platform or another python implementation (cython, ironpython, etc. pp) there might be no difference.

If I understand the algorithm right, it’s a simple line sweep. I will take a closer look if I find the time. Maybe you can gain a speedup if you build a B-Tree or a similiar structure instead of sorting and doing all the calculations. It might be even better if you already get the boxes in a Tree from the Detector instead of a simple list.

You are correct Frank — the speedup comes from the highly optimized vector operations of NumPy.

Another twist to the algorithm would be to have both a score and a segmentation mask for each bounding box. Even though it’s easy to compute the overlap between boolean masks with a bitwise and, I’m still trying to figure out a fast way to do it.

Regards

Is it also possible to do that in 3D ? 🙂

Hi Adrian, a question here.

How can i edit the code so that it can accommodate array with more than 4 values ?

I’m getting error :

for (startX, startY, endX, endY) in boundingBoxes:

ValueError: too many values to unpack (expected 4)

You’ll need to create variables in the parenthesis for each of the items in the array.

I noticed one issue when testing this out on some pedestrian detection: sometimes I ended up with >1 overlapping boxes for the same pedestrian detected after applying the NMS, and I think this is because the areal overlapping ratio is computed as

ratio = area of intersection / area of the checked box

It could happen that the reference box (choosing the from the largest y2 coordinate as in this implementation) is relatively small, and it resides entirely or almost entirely inside a larger box, as long as the area_small / area_large <= threshold, it will pass the checking, and you end up with overlapping boxes.

This won't be a problem if the sorting is on area instead of y2. Or use the

`numpy.minimum(area[i], area[idxs[:last]])`

as the denominator in the ratio computation, if one chooses to sort by the probability or something.And another question I have: the resultant bounding box is not always that accurate, is there any easy hack on the currently NMS that could help improve this?

Do you have the probability of the prediction associated with the bounding box? If so, you can sort on the probability instead of an arbitrary coordinator (some models do not provide a probability and I did not want to assume one for this post).

That said, I have an updated NMS algorithm inside my imutils library that might help you out here.

To answer your second question: NMS cannot “improve” the bounding boxes generated by your detector. Your detector should be responsible for producing quality bounding boxes. NMS will then suppress overlapping ones.

When I look at the bounding boxes, I wondered: How effecient would it be to simply find the maxmimums of all topleft_x s and minimums of all bottomright_x s (and likewise for the Ys)

Are there any disadvantages?

[facepalm] Of course, works only when you are expecting one face in the image.

Hi,

This is a great article! Recently, I came across a paper which discusses an improvement to the NMS and is also published by your Alma Mater. Maybe you can also update their explanation in a simpler manner?

Check it out: https://arxiv.org/pdf/1704.04503.pdf

They also have the python code available.

Thanks for sharing, Bhargav!

Hi Adrian,

Thank you for the post. Can I implement this for multiple objects template matching? I have a code which for lower threshold for Correlation coefficient detects multiple overlapping bounding boxes until the whole section is colored (painted) with the rectangles. Can I use NMS for this problem? Could you point me in the right direction?

Best,

Vishal

This method works for multiple objects and multiple class labels (you would just run NMS on each class label). It sounds like you are getting way to many bounding boxes to begin with so you may want to consider methods to prune them down based on correlation value thresholding.

In my case I am returning bounding boxes of same size, how would it work in that case? Also can I sort based on correlation coefficient value instead of bottom right y coordinate, would you recommend it? I am sorry for the trivial question.

Best,

Vishal

But for multiple detections on template matching the w and h for all the bounding boxes would be same, i.e., based on template size. How would I incorporate this algorithm in that case?

You would apply NMS based on the confidence of the template matching. See my implementation of NMS in imutils (take a look at the “probs” variable).

Hello,

Thanks for a great article, it was very useful on how NMS works.

One thing, I tried to time the fast and slow method but for some reason in my machine don’t see a 10x machine.

Here is the the gist – https://gist.github.com/PREM1980/93ec1298bea0495feaae77c798a345f0

I see minor differences. Not sure if Im doing something wrong.

How many bounding boxes do you have for each image?