Raspberry Pi: Deep learning object detection with OpenCV

A few weeks ago I demonstrated how to perform real-time object detection using deep learning and OpenCV on a standard laptop/desktop.

After the post was published I received a number of emails from PyImageSearch readers who were curious if the Raspberry Pi could also be used for real-time object detection.

The short answer is “kind of”…

…but only if you set your expectations accordingly.

Even when applying our optimized OpenCV + Raspberry Pi install the Pi is only capable of getting up to ~0.9 frames per second when applying deep learning for object detection with Python and OpenCV.

Is that fast enough?

Well, that depends on your application.

If you’re attempting to detect objects that are quickly moving through your field of view, likely
not.

But if you’re monitoring a low traffic environment with slower moving objects, the Raspberry Pi could indeed be fast enough.

In the remainder of today’s blog post we’ll be reviewing two methods to perform deep learning-based object detection on the Raspberry Pi.

Looking for the source code to this post?

Raspberry Pi: Deep learning object detection with OpenCV

Today’s blog post is broken down into two parts.

In the first part, we’ll benchmark the Raspberry Pi for real-time object detection using OpenCV and Python. This benchmark will come from the exact code we used for our laptop/desktop deep learning object detector from a few weeks ago.

I’ll then demonstrate how to use multiprocessing to create an alternate method to object detection using the Raspberry Pi. This method may or may not be useful for your particular application, but at the very least it will give you an idea on different methods to approach the problem.

Object detection and OpenCV benchmark on the Raspberry Pi

The code we’ll discuss in this section is is identical to our previous post on Real-time object detection with deep learning and OpenCV; therefore, I will not be reviewing the code exhaustively.

For a deep dive into the code, please see the original post.

Instead, we’ll simply be using this code to benchmark the Raspberry Pi for deep learning-based object detection.

To get started, open up a new file, name it real_time_object_detection.py , and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2

We then need to parse our command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Followed by performing some initializations:

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

We initialize CLASSES , our class labels, and corresponding COLORS , for on-frame text and bounding boxes (Lines 22-26), followed by loading the serialized neural network model (Line 30).

Next, we’ll initialize the video stream object and frames per second counter:

# initialize the video stream, allow the camera sensor to warm up,
# and initialize the FPS counter
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
# vs = VideoStream(usePiCamera=True).start()
time.sleep(2.0)
fps = FPS().start()

Wwe initialize the video stream and allow the camera warm up for 2.0 seconds (Lines 35-37).

On Line 35 we initialize our VideoStream using a USB camera If you are using the Raspberry Pi camera module you’ll want to comment out Line 35 and uncomment Line 36 (which will enable you to access the Raspberry Pi camera module via the VideoStream class).

From there we start our fps counter on Line 38.

We are now ready to loop over frames from our input video stream:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 400 pixels
	frame = vs.read()
	frame = imutils.resize(frame, width=400)

	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)),
		0.007843, (300, 300), 127.5)

	# pass the blob through the network and obtain the detections and
	# predictions
	net.setInput(blob)
	detections = net.forward()

Lines 41-55 simply grab and resize a frame , convert it to a blob , and pass the blob through the neural network, obtaining the detections and bounding box predictions.

From there we need to loop over the detections to see what objects were detected in the frame :

	# loop over the detections
	for i in np.arange(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with
		# the prediction
		confidence = detections[0, 0, i, 2]

		# filter out weak detections by ensuring the `confidence` is
		# greater than the minimum confidence
		if confidence > args["confidence"]:
			# extract the index of the class label from the
			# `detections`, then compute the (x, y)-coordinates of
			# the bounding box for the object
			idx = int(detections[0, 0, i, 1])
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")

			# draw the prediction on the frame
			label = "{}: {:.2f}%".format(CLASSES[idx],
				confidence * 100)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(frame, label, (startX, y),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

On Lines 58-80, we loop over our detections . For each detection we examine the confidence and ensure the corresponding probability of the detection is above a predefined threshold. If it is, then we extract the class label and compute (x ,y) bounding box coordinates. These coordinates will enable us to draw a bounding box around the object in the image along with the associated class label.

From there we’ll finish out the loop and do some cleanup:

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# update the FPS counter
	fps.update()

# stop the timer and display FPS information
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Lines 82-91 close out the loop — we show each frame, break if ‘q’ key is pressed, and update our fps counter.

The final terminal message output and cleanup is handled on Lines 94-100.

Now that our brief explanation of real_time_object_detection.py is finished, let’s examine the results of this approach to obtain a baseline.

Go ahead and use the “Downloads” section of this post to download the source code and pre-trained models.

From there, execute the following command:

$ python real_time_object_detection.py \
	--prototxt MobileNetSSD_deploy.prototxt.txt \
	--model MobileNetSSD_deploy.caffemodel
[INFO] loading model...
[INFO] starting video stream...
[INFO] elapsed time: 54.70
[INFO] approx. FPS: 0.90

As you can see from my results we are obtaining ~0.9 frames per second throughput using this method and the Raspberry Pi.

Compared to the 6-7 frames per second using our laptop/desktop we can see that the Raspberry Pi is substantially slower.

That’s not to say that the Raspberry Pi is unusable when applying deep learning object detection, but you need to set your expectations on what’s realistic (even when applying our OpenCV + Raspberry Pi optimizations).

Note: For what it’s worth, I could only obtain 0.49 FPS when NOT using our optimized OpenCV + Raspberry Pi install — that just goes to show you how much of a difference NEON and VFPV3 can make.

A different approach to object detection on the Raspberry Pi

Using the example from the previous section we see that calling net.forward() is a blocking operation — the rest of the code in the while loop is not allowed to complete until net.forward() returns the detections .

So, what if net.forward() was not a blocking operation?

Would we able to obtain a faster frames per second throughput?

Well, that’s a loaded question.

No matter what, it will take approximately a little over a second for net.forward() to complete using the Raspberry Pi and this particular architecture — that cannot change.

But what we can do is create a separate process that is solely responsible for applying the deep learning object detector, thereby unblocking the main thread of execution and allow our while loop to continue.

Moving the predictions to separate process will give the illusion that our Raspberry Pi object detector is running faster than it actually is, when in reality the net.forward() computation is still taking a little over one second.

The only problem here is that our output object detection predictions will lag behind what is currently being displayed on our screen. If you detecting fast-moving objects you may miss the detection entirely, or at the very least, the object will be out of the frame before you obtain your detections from the neural network.

Therefore, this approach should only be used for slow-moving objects where we can tolerate lag.

To see how this multiprocessing method works, open up a new file, name it pi_object_detection.py , and insert the following code:

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
from multiprocessing import Process
from multiprocessing import Queue
import numpy as np
import argparse
import imutils
import time
import cv2

For the code walkthrough in this section, I’ll be pointing out and explaining the differences (there are quite a few) compared to our non-multprocessing method.

Our imports on Lines 2-10 are mostly the same, but notice the imports of Process and Queue from Python’s multiprocessing package.

Next, I’d like to draw your attention to a new function, classify_frame :

def classify_frame(net, inputQueue, outputQueue):
	# keep looping
	while True:
		# check to see if there is a frame in our input queue
		if not inputQueue.empty():
			# grab the frame from the input queue, resize it, and
			# construct a blob from it
			frame = inputQueue.get()
			frame = cv2.resize(frame, (300, 300))
			blob = cv2.dnn.blobFromImage(frame, 0.007843,
				(300, 300), 127.5)

			# set the blob as input to our deep learning object
			# detector and obtain the detections
			net.setInput(blob)
			detections = net.forward()

			# write the detections to the output queue
			outputQueue.put(detections)

Our new classify_frame function is responsible for our multiprocessing — later on we’ll set it up to run in a child process.

The classify_frame function takes three parameters:

net : the neural network object.
inputQueue : our FIFO (first in first out) queue of frames for object detection.
outputQueue: our FIFO queue of detections which will be processed in the main thread.

This child process will loop continuously until the parent exits and effectively terminates the child.

In the loop, if the inputQueue contains a frame , we grab it, and then pre-process it and create a blob (Lines 16-22), just as we have done in the previous script.

From there, we send the blob through the neural network (Lines 26-27) and place the detections in an outputQueue for processing by the parent.

Now let’s parse our command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

There is no difference here — we are simply parsing the same command line arguments on Lines 33-40.

Next we initialize some variables just as in our previous script:

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

This code is the same — we initialize class labels, colors, and load our model.

Here’s where things get different:

# initialize the input queue (frames), output queue (detections),
# and the list of actual detections returned by the child process
inputQueue = Queue(maxsize=1)
outputQueue = Queue(maxsize=1)
detections = None

On Lines 56-58 we initialize an inputQueue of frames, an outputQueue of detections, and a detections list.

Our inputQueue will be populated by the parent and processed by the child — it is the input to the child process. Our outputQueue will be populated by the child, and processed by the parent — it is output from the child process. Both of these queues trivially have a size of one as our neural network will only be applying object detections to one frame at a time.

Let’s initialize and start the child process:

# construct a child process *indepedent* from our main process of
# execution
print("[INFO] starting process...")
p = Process(target=classify_frame, args=(net, inputQueue,
	outputQueue,))
p.daemon = True
p.start()

It is very easy to construct a child process with Python’s multiprocessing module — simply specify the target function and args to the function as we have done on Lines 63 and 64.

Line 65 specifies that p is a daemon process, and Line 66 kicks the process off.

From there we’ll see some more familiar code:

# initialize the video stream, allow the cammera sensor to warmup,
# and initialize the FPS counter
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
# vs = VideoStream(usePiCamera=True).start()
time.sleep(2.0)
fps = FPS().start()

Don’t forget to change your video stream object to use the PiCamera if you desire by switching which line is commented (Lines 71 and 72).

Once our vs object and fps counters are initialized, we can loop over the video frames:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream, resize it, and
	# grab its dimensions
	frame = vs.read()
	frame = imutils.resize(frame, width=400)
	(fH, fW) = frame.shape[:2]

On Lines 80-82, we read a frame, resize it, and extract the width and height.

Next, we’ll work our our queues into the flow:

	# if the input queue *is* empty, give the current frame to
	# classify
	if inputQueue.empty():
		inputQueue.put(frame)

	# if the output queue *is not* empty, grab the detections
	if not outputQueue.empty():
		detections = outputQueue.get()

First we check if the inputQueue is empty — if it is empty, we put a frame in the inputQueue for processing by the child (Lines 86 and 87). Remember, the child process is running in an infinite loop, so it will be processing the inputQueue in the background.

Then we check if the outputQueue is not empty — if it is not empty (something is in it), we grab the detections for processing here in the parent (Lines 90 and 91). When we call get() on the outputQueue , the detections are returned and the outputQueue is now momentarily empty.

If you are unfamiliar with Queues or if you want a refresher, see this documentation.

Let’s process our detections:

	# check to see if our detectios are not None (and if so, we'll
	# draw the detections on the frame)
	if detections is not None:
		# loop over the detections
		for i in np.arange(0, detections.shape[2]):
			# extract the confidence (i.e., probability) associated
			# with the prediction
			confidence = detections[0, 0, i, 2]

			# filter out weak detections by ensuring the `confidence`
			# is greater than the minimum confidence
			if confidence < args["confidence"]:
				continue

			# otherwise, extract the index of the class label from
			# the `detections`, then compute the (x, y)-coordinates
			# of the bounding box for the object
			idx = int(detections[0, 0, i, 1])
			dims = np.array([fW, fH, fW, fH])
			box = detections[0, 0, i, 3:7] * dims
			(startX, startY, endX, endY) = box.astype("int")

			# draw the prediction on the frame
			label = "{}: {:.2f}%".format(CLASSES[idx],
				confidence * 100)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(frame, label, (startX, y),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

If our detections list is populated (it is not None ), we loop over the detections as we have done in the previous section’s code.

In the loop, we extract and check the confidence against the threshold (Lines 100-105), extract the class label index (Line 110), and draw a box and label on the frame (Lines 111-122).

From there in the while loop we’ll complete a few remaining steps, followed by printing some statistics to the terminal, and performing cleanup:

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

	# update the FPS counter
	fps.update()

# stop the timer and display FPS information
fps.stop()
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

In the remainder of the loop, we display the frame to the screen (Line 125) and capture a key press and check if it is the quit key at which point we break out of the loop (Lines 126-130). We also update our fps counter.

To finish out, we stop the fps counter, print our time/FPS statistics, and finally close windows and stop the video stream (Lines 136-142).

Now that we’re done walking through our new multiprocessing code, let’s compare the method to the single thread approach from the previous section.

Be sure to use the “Downloads” section of this blog post to download the source code + pre-trained MobileNet SSD neural network. From there, execute the following command:

$ python pi_object_detection.py \
	--prototxt MobileNetSSD_deploy.prototxt.txt \
	--model MobileNetSSD_deploy.caffemodel
[INFO] loading model...
[INFO] starting process...
[INFO] starting video stream...
[INFO] elapsed time: 48.55
[INFO] approx. FPS: 27.83

Here you can see that our while loop is capable of processing 27 frames per second. However, this throughput rate is an illusion — the neural network running in the background is still only capable of processing 0.9 frames per second.

Note: I also tested this code on the Raspberry Pi camera module and was able to obtain 60.92 frames per second over 35 elapsed seconds.

The difference here is that we can obtain real-time throughput by displaying each new input frame in real-time and then drawing any previous detections on the current frame.

Once we have a new set of detections we then draw the new ones on the frame.

This process repeats until we exit the script. The downside is that we see substantial lag. There are clips in the above video where we can see that all objects have clearly left the field of view…

…however, our script still reports the objects as being present.

Therefore, you should consider only using this approach when:

Objects are slow moving and the previous detections can be used as an approximation to the new location.
Displaying the actual frames themselves in real-time is paramount to user experience.

What's next? We recommend PyImageSearch University.

Course information:
84 total classes • 114+ hours of on-demand code walkthrough videos • Last updated: February 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 84 courses on essential computer vision, deep learning, and OpenCV topics
✓ 84 Certificates of Completion
✓ 114+ hours of on-demand video
✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 536+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In today’s blog post we examined using the Raspberry Pi for object detection using deep learning, OpenCV, and Python.

As our results demonstrated we were able to get up to 0.9 frames per second, which is not fast enough to constitute real-time detection. That said, given the limited processing power of the Pi, 0.9 frames per second is still reasonable for some applications.

We then wrapped up this blog post by examining an alternate method to deep learning object detection on the Raspberry Pi by using multiprocessing. Whether or not this second approach is suitable for you is again highly dependent on your application.

If your use case involves low traffic object detection where the objects are slow moving through the frame, then you can certainly consider using the Raspberry Pi for deep learning object detection. However, if you are developing an application that involves many objects that are fast moving, you should instead consider faster hardware.

Thanks for reading and enjoy!

And if you’re interested in studying deep learning in more depth, be sure to take a look at my new book, Deep Learning for Computer Vision with Python. Whether this is the first time you’ve worked with machine learning and neural networks or you’re already a seasoned deep learning practitioner, my new book is engineered from the ground up to help you reach expert status.

Just click here to start your journey to deep learning mastery.

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

Raspberry Pi: Deep learning object detection with OpenCV

Object detection and OpenCV benchmark on the Raspberry Pi

A different approach to object detection on the Raspberry Pi

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

PyImageSearch University

Adding a web interface to our image search engine with Flask

Getting started with the Intel Movidius Neural Compute Stick

Text Detection and OCR with Amazon Rekognition API

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Raspberry Pi: Deep learning object detection with OpenCV

Object detection and OpenCV benchmark on the Raspberry Pi

A different approach to object detection on the Raspberry Pi

What's next? We recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?