Recognizing digits with OpenCV and Python

Today’s tutorial is inspired by a post I saw a few weeks back on /r/computervision asking how to recognize digits in an image containing a thermostat identical to the one at the top of this post.

As Reddit users were quick to point out, utilizing computer vision to recognize digits on a thermostat tends to overcomplicate the problem — a simple data logging thermometer would give much more reliable results with a fraction of the effort.

On the other hand, applying computer vision to projects such as these are really good practice.

Whether you are just getting started with computer vision/OpenCV, or you’re already writing computer vision code on a daily basis, taking the time to hone your skills on mini-projects are paramount to mastering your trade — in fact, I find it so important that I do exercises like this one twice a month.

Every other Friday afternoon I block off two hours on my calendar and practice my basic image processing and computer vision skills on computer vision/OpenCV questions I’ve found on Reddit or StackOverflow.

Doing this exercise helps me keep my skills sharp — it also has the added benefit of making great blog post content.

In the remainder of today’s blog post, I’ll demonstrate how to recognize digits in images using OpenCV and Python.

Looking for the source code to this post?
Jump right to the downloads section.

Recognizing digits with OpenCV and Python

In the first part of this tutorial, we’ll discuss what a seven-segment display is and how we can apply computer vision and image processing operations to recognize these types of digits (no machine learning required!)

From there I’ll provide actual Python and OpenCV code that can be used to recognize these digits in images.

The seven-segment display

You’re likely already familiar with a seven-segment display, even if you don’t recognize the particular term.

A great example of such a display is your classic digital alarm clock:

Figure 1: A classic digital alarm clock that contains four seven-segment displays to represent the time of day.

Figure 1: A classic digital alarm clock that contains four seven-segment displays to represent the time of day.

Each digit on the alarm clock is represented by a seven-segment component just like the one below:

Figure 2: An example of a single seven-segment display. Each segment can be turned "on" or "off" to represent a particular digit.

Figure 2: An example of a single seven-segment display. Each segment can be turned “on” or “off” to represent a particular digit (source: Wikipedia).

Sevent-segment displays can take on a total of 128 possible states:

Figure 3: A seven-segment display is capable of 128 possible states (source: Wikipedia).

Figure 3: A seven-segment display is capable of 128 possible states (source: Wikipedia).

Luckily for us, we are only interested in ten of them — the digits zero to nine:

Figure 4: For the task of digit recognition we only need to recognize ten of these states.

Our goal is to write OpenCV and Python code to recognize each of these ten digit states in an image.

Planning the OpenCV digit recognizer

Just like in the original post on /r/computervision, we’ll be using the thermostat image as input:

Figure 5: Our example input image. Our goal is to recognize the digits on the thermostat using OpenCV and Python.

Figure 5: Our example input image. Our goal is to recognize the digits on the thermostat using OpenCV and Python.

Whenever I am trying to recognize/identify object(s) in an image I first take a few minutes to assess the problem. Given that my end goal is to recognize the digits on the LCD display I know I need to:

  • Step #1: Localize the LCD on the thermostat. This can be done using edge detection since there is enough contrast between the plastic shell and the LCD.
  • Step #2: Extract the LCD. Given an input edge map I can find contours and look for outlines with a rectangular shape — the largest rectangular region should correspond to the LCD. A perspective transform will give me a nice extraction of the LCD.
  • Step #3: Extract the digit regions. Once I have the LCD itself I can focus on extracting the digits. Since there seems to be contrast between the digit regions and the background of the LCD I’m confident that thresholding and morphological operations can accomplish this.
  • Step #4: Identify the digits. Recognizing the actual digits with OpenCV will involve dividing the digit ROI into seven segments. From there I can apply pixel counting on the thresholded image to determine if a given segment is “on” or “off”.

So see how we can accomplish this four-step process to digit recognition with OpenCV and Python, keep reading.

Recognizing digits with computer vision and OpenCV

Let’s go ahead and get this example started.

Open up a new file, name it recognize_digits.py , and insert the following code:

Lines 2-5 import our required Python packages. We’ll be using imutils, my series of convenience functions to make working with OpenCV + Python easier. If you don’t already have imutils  installed, you should take a second now to install the package on your system using pip :

Lines 9-20 define a Python dictionary named DIGITS_LOOKUP . Inspired by the approach of /u/Jonno_FTW in the Reddit thread, we can easily define this lookup table where:

  1. They key to the table is the seven-segment array. A one in the array indicates that the given segment is on and a zero indicates that the segment is off.
  2. The value is the actual numerical digit itself: 0-9.

Once we identify the segments in the thermostat display we can pass the array into our DIGITS_LOOKUP  table and obtain the digit value.

For reference, this dictionary uses the same segment ordering as in Figure 2 above.

Let’s continue with our example:

Line 23 loads our image from disk.

We then pre-process the image on Lines 27-30 by:

  • Resizing it.
  • Converting the image to grayscale.
  • Applying Gaussian blurring with a 5×5 kernel to reduce high-frequency noise.
  • Computing the edge map via the Canny edge detector.

After applying these pre-processing steps our edge map looks like this:

Figure 6: Applying image processing steps to compute the edge map of our input image.

Figure 6: Applying image processing steps to compute the edge map of our input image.

Notice how the outlines of the LCD are clearly visible — this accomplishes Step #1.

We can now move on to Step #2, extracting the LCD itself:

In order to find the LCD regions, we need to extract the contours (i.e., outlines) of the regions in the edge map (Lines 35 and 35).

We then sort the contours by their area, ensuring that contours with a larger area are placed at the front of the list (Line 37).

Given our sorted contours list, we loop over them individually on Line 41 and apply contour approximation.

If our approximated contour has four vertices then we assume we have found the thermostat display (Lines 48-50). This is a reasonable assumption since the largest rectangular region in our input image should be the LCD itself.

After obtaining the four vertices we can extract the LCD via a four point perspective transform:

Applying this perspective transform gives us a top-down, birds-eye-view of the LCD:

Figure 7: Applying a perspective transform to our image to obtain the LCD region.

Figure 7: Applying a perspective transform to our image to obtain the LCD region.

Obtaining this view of the LCD satisfies Step #2 — we are now ready to extract the digits from the LCD:

To obtain the digits themselves we need to threshold the warped  image (Lines 59 and 60) to reveal the dark regions (i.e., digits) against the lighter background (i.e., the background of the LCD display):

Figure 8: Thresholding LCD allows us to segment the dark regions (digits/symbols) from the lighter background (the LCD display itself).

Figure 8: Thresholding LCD allows us to segment the dark regions (digits/symbols) from the lighter background (the LCD display itself).

We then apply a series of morphological operations to clean up the thresholded image (Lines 61 and 62):

Figure 9: Applying a series of morphological operations cleans up our thresholded LCD and will allow us to segment out each of the digits.

Figure 9: Applying a series of morphological operations cleans up our thresholded LCD and will allow us to segment out each of the digits.

Now that we have a nice segmented image we once again need to apply contour filtering, only this time we are looking for the actual digits:

To accomplish this we find contours in our thresholded image (Lines 66 and 67). We also initialize the digitsCnts  list on Line 69 — this list will store the contours of the digits themselves.

Line 72 starts looping over each of the contours.

For each contour, we compute the bounding box (Line 74), ensure the width and height are of an acceptable size, and if so, update the digitsCnts  list (Lines 77 and 78).

Note: Determining the appropriate width and height constraints requires a few rounds of trial and error. I would suggest looping over each of the contours, drawing them individually, and inspecting their dimensions. Doing this process ensures you can find commonalities across digit contour properties.

If we were to loop over the contours inside digitsCnts  and draw the bounding box on our image, the result would look like this:

Figure 10: Drawing the bounding box of each of the digits on the LCD.

Figure 10: Drawing the bounding box of each of the digits on the LCD.

Sure enough, we have found the digits on the LCD!

The final step is to actually identify each of the digits:

Here we are simply sorting our digit contours from left-to-right based on their (x, y)-coordinates.

This sorting step is necessary as there are no guarantees that the contours are already sorted from left-to-right (the same direction in which we would read the digits).

Next, comes the actual digit recognition process:

We start looping over each of the digit contours on Line 87.

For each of these regions, we compute the bounding box and extract the digit ROI (Lines 89 and 90).

I have included a GIF animation of each of these digit ROIs below:

Figure 11: Extracting each individual digit ROI by computing the bounding box and applying NumPy array slicing.

Figure 11: Extracting each individual digit ROI by computing the bounding box and applying NumPy array slicing.

Given the digit ROI we now need to localize and extract the seven segments of the digit display.

Lines 94-96 compute the approximate width and height of each segment based on the ROI dimensions.

We then define a list of (x, y)-coordinates that correspond to the seven segments on Lines 99-107. This list follows the same order of segments as Figure 2 above.

Here is an example GIF animation that draws a green box over the current segment being investigated:

Figure 12: An example of drawing the segment ROI for each of the seven segments of the digit.

Figure 12: An example of drawing the segment ROI for each of the seven segments of the digit.

Finally, Line 108 initializes our on  list — a value of one inside this list indicates that a given segment is turned “on” while a value of zero indicates the segment is “off”.

Given the (x, y)-coordinates of the seven display segments, identifying a whether a segment is on or off is fairly easy:

We start looping over the (x, y)-coordinates of each segment on Line 111.

We extract the segment ROI on Line 115, followed by computing the number of non-zero pixels on Line 116 (i.e., the number of pixels in the segment that are “on”).

If the ratio of non-zero pixels to the total area of the segment is greater than 50% then we can assume the segment is “on” and update our on  list accordingly (Lines 121 and 122).

After looping over the seven segments we can pass the on  list to DIGITS_LOOKUP  to obtain the digit itself.

We then draw a bounding box around the digit and display the digit on the output  image.

Finally, our last code block prints the digit to our screen and displays the output image:

Notice how we have been able to correctly recognize the digits on the LCD screen using Python and OpenCV:

Figure 13: Correctly recognizing digits in images with OpenCV and Python.

Figure 13: Correctly recognizing digits in images with OpenCV and Python.

Summary

In today’s blog post I demonstrated how to utilize OpenCV and Python to recognize digits in images.

This approach is specifically intended for seven-segment displays (i.e., the digit displays you would typically see on a digital alarm clock).

By extracting each of the seven segments and applying basic thresholding and morphological operations we can determine which segments are “on” and which are “off”.

From there, we can look up the on/off segments in a Python dictionary data structure to quickly determine the actual digit — no machine learning required!

As I mentioned at the top of this blog post, applying computer vision to recognizing digits in a thermostat image tends to overcomplicate the problem itself — utilizing a data logging thermometer would be more reliable and require substantially less effort.

However, in the case that (1) you do not have access to a data logging sensor or (2) you simply want to hone and practice your computer vision/OpenCV skills, it’s often helpful to see a solution such as this one demonstrating how to solve the project.

I hope you enjoyed today’s post!

To be notified when future blog posts are published, be sure to enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, ,

55 Responses to Recognizing digits with OpenCV and Python

  1. Andrei February 13, 2017 at 11:18 am #

    Hi,

    in Figure 4 the image showing a 2 is missing 🙂
    Perhaps you can add it to complete this good tutorial 😀

    • Adrian Rosebrock February 13, 2017 at 1:32 pm #

      You’re correct Andrei, thank you for pointing this out. I’ll get an updated image uploaded 🙂

  2. James-Hung February 13, 2017 at 11:23 am #

    Do you have the similar implementation in C++ ?

    Thanks in advance,

    regards,

    James-Hung

    • Adrian Rosebrock February 13, 2017 at 1:31 pm #

      Hi James-Hung — I only cover Python implementations on this blog.

    • Tham February 13, 2017 at 4:08 pm #

      I think it is quite easy to convert it to modern c++ implementation. One of the best things of learning c++ is, after you get familiar with it, you will find out you can pick up lot of languages in short time, especially a language with nice syntax and libs like python.

      Thanks for the tutorial, this is a nice solution, especially step 4, I believe I would use machine learning(trained by mnist or other datasets) to recognize the digits rather than this creative, simple solution.

      • Tham February 13, 2017 at 4:13 pm #

        Sorry, I think I did not express my though clearly, what I mean was I do not know there are such creative solution before I study this post, so I would prefer machine learning as character recognition, although ml may be a more robust solutions, it also takes more times and much expensive than this solution.

        • Adrian Rosebrock February 14, 2017 at 1:25 pm #

          Using machine learning to solve this problem is 100% acceptable; however, there are times when a clever use of image processing can achieve just as high accuracy with less effort. Of course, this is a bit of a contrived example and for more robustness machine learning should absolutely be considered, especially if there are changes in reflection, lighting conditions, etc.

  3. Preetinder Singh February 13, 2017 at 11:28 am #

    Interesting. really liked the post. Thanks for sharing. In case the scene illumination changes, the algorithm usually breaks or becomes less accurate. Please suggest all the different computer vision techniques in practice in order to remove or minimize the effects of illumination/brightness/contrast changes of the image for the algorithm to still work correctly OR at least with high accuracy ?

    • Adrian Rosebrock February 13, 2017 at 1:31 pm #

      If you need a more robust solution you should consider using machine learning to both localize the screen followed by recognize the actual digits.

  4. Douglas Jones February 13, 2017 at 11:28 am #

    Adrian,
    A most excellent post and your timing is impeccable! I happen to have a need for just such 7-segment digit recognizer. Leaving the data logging sensor aside (where’s the fun in that) obviously this is just one way of using computer vision to recognize these digits. In your bag of goodies do you happen to have some thoughts on how one would do this WITH machine learning? I am guessing that KNN might be a good approach. Thoughts?

    Thanks!

    • Adrian Rosebrock February 13, 2017 at 1:30 pm #

      Hey Douglas — I’m glad the post was timed well! As for doing this with machine learning, yes, it’s absolutely possible. I demonstrate how to recognize (handwritten) digits inside Practical Python and OpenCV and then discuss more advanced solutions inside the PyImageSearch Gurus course, but a good starting point would be the HOG descriptor + SVM.

      • Douglas Jones February 13, 2017 at 8:07 pm #

        Thanks Adrian! I have the books and associated video and have gone through them quite a lot (seems transatlantic/transpacific flights leave LOTs of reading time!)

        I have not had the chance to try HOG and SVM. Since I am under the gun, so to speak, I will try and get a comparison of the two once converted to C#. I mentioned KNN because it is a lazy learning method and might be a touch faster. I am having to do all this in real time based on 60fps, so speed is always a worry. Especially when a single frame might contain several indicators with varying numbers of digits.

        • Adrian Rosebrock February 14, 2017 at 1:23 pm #

          k-NN is faster to train (since there is no training process) but is slower at test time, depending on how large your dataset is. You can apply approximate nearest neighbors to speed it up, but in the long run I think you’ll find better accuracy with HOG + SVM. Only quantify the digit region of the image via HOG and then pass the feature vector into the SVM.

          • Douglas Jones February 15, 2017 at 2:50 pm #

            Thanks. I will give that a try. As I have discovered fr0om this blog’s code, translating to C# EmguCV/OpenCV is not straightforward at all. You have numpy and imutils plus some home grown routines which I do not have in C#. One thing I thought odd is doing the exact same steps up to performing the canny edge detection gave me an image that looked different from yours. I guess I find that odd because at the bottom of the code it is all OpenCV. You would think converting to gray, Gaussian blurring and doing Canny should give me the same image.

            I shall look into the HOG + SVM in your book and will continue to see if I can translate this blog’s code into C#.

  5. sinhue February 13, 2017 at 12:43 pm #

    Hi, Adrian. I read the same post on reddit a few weeks ago, so when I checked my email it was a great surprise for me. I will check your solution. Thanks for sharing!

    • Adrian Rosebrock February 13, 2017 at 1:28 pm #

      I’m glad you saw the same post as well Sinhue! /r/computervision is a great resource for keeping up with computer vision news.

  6. Manisha February 13, 2017 at 10:39 pm #

    Hi Adrian,
    I get an error at print(u”{}{}.{} \u00b0C”.format(*digits))

    “IndexError: tuple index out of range”

    if I comment out print stmt I see bounding box for last digit only

    • Adrian Rosebrock February 15, 2017 at 9:13 am #

      Hi Manisha — make sure you use the “Downloads” section of this tutorial to download the source code + example image. It sounds like you might have copied and pasted code and accidentally introduced an error where not all digits are detected.

  7. Leena February 13, 2017 at 11:17 pm #

    Thanks for really useful post for many application.

  8. Cristobal February 14, 2017 at 6:07 am #

    Hi Adrian! In Figure 4 the number 1 it would be on segments 2 and 5 like in the dictionary. Thanks for sharing!!

  9. sun February 14, 2017 at 11:30 am #

    thank you adrian
    i want to used this code in your project(bubbles sheet omr), to read students numbers.

    regards

    • sun February 14, 2017 at 11:32 am #

      sorry i mean, can i used it in bubbles sheet project to read students numbers.
      regards

      • Adrian Rosebrock February 14, 2017 at 1:20 pm #

        Can you elaborate more on “read students numbers”? The numbers on what? The bubble sheet itself?

        • sun February 14, 2017 at 7:39 pm #

          i mean “student id” in the top of any omr paper like this image:
          https://academic-ict-2010.wikispaces.com/file/view/OMRABCDE.jpg/179931515/307×361/OMRABCDE.jpg

          we can used your project to getting student id by
          print row containt 10 “seven-segment” on the top of the paper. then the student shadded his id before he answering the questions (shadded the bubbles).

          • Adrian Rosebrock February 15, 2017 at 9:04 am #

            If you’re doing OMR and bubble sheet recognition, why not follow the approach detailed here? Or is your goal to validate that what the user bubbles in matches what they wrote? If it’s the latter, one of the best methods to determine these digits would be to train a simple digit detector using machine learning. I demonstrate how to train your own digit detectors inside Practical Python and OpenCV.

  10. Steve February 14, 2017 at 12:37 pm #

    Hi Adrian, very interesting. I have a note which is beside the point of the image-recognition, but may be useful: you have the “1” as represented by two vertical segments on the left, but it may be two vertical segments on the right (take a look at the alarm clock picture on this very page). I imagine it would be simple to add a second entry to your lookup table to account for this. Cheers.

    • Steve February 14, 2017 at 12:38 pm #

      Or rather: your image of the ten digits has it on the left, but the lookup table seems to have it on the right (2, 5). Either way, a second entry would help to make this work across different displays.

      • Adrian Rosebrock February 14, 2017 at 1:19 pm #

        Great point, thanks for sharing Steve!

  11. Shuvam Ghosh February 14, 2017 at 3:35 pm #

    Awesome post.
    After the canny edge detection and countour analysis, we assume that the largest rectangle with four vertices is the LCD. But in fact it is the whole outline of the thermostat(I.e. the output after canny edge detection as shown) and not the LCD. I found this part confusing. Can you please explain me this. The largest rectangle with 4 vertices for me is the thermostat outline not the LCD.

    • Adrian Rosebrock February 15, 2017 at 9:09 am #

      After contour approximation the thermostat box does not have 4 vertices. Looking at the edge map you can also see that the thermostat box does not form a rectangle — there are disconnects along its path. Therefore, it’s not a rectangle and our algorithm does not consider it.

  12. delta February 14, 2017 at 10:33 pm #

    in python3, opencv3, i got this msg:
    object has no attrute ‘reshape’.

  13. Mark February 15, 2017 at 3:08 pm #

    Hello Adrian, excellent post. Visiting your blog is always satisfying for those who work in image processing.

    • Adrian Rosebrock February 16, 2017 at 9:51 am #

      Thank you Mark, I really appreciate that 🙂

  14. Harsh February 15, 2017 at 3:34 pm #

    I am trying to run code under Ubuntu with python 3.5 and opencv 3.0 and getting an import error

    File “recognize_digits.py”, line 5, in
    from imutils.perspective import four_point_transform
    ImportError: No module named imutils.perspective

    • Adrian Rosebrock February 16, 2017 at 9:51 am #

      Make sure you install the imutils library on your system:

      $ pip install --upgrade imutils

      • Harsh February 16, 2017 at 4:04 pm #

        Thanks for your quick reply. I already run upgrade imutils but it still shows the same error. Previously I followed steps from [http://www.pyimagesearch.com/2015/07/20/install-opencv-3-0-and-python-3-4-on-ubuntu/] to setup my environment

        • Adrian Rosebrock February 20, 2017 at 8:04 am #

          If you are using a Python virtual environment for your code, make sure you upgrade imutils there as well. The imutils library published to PyPI is indeed the latest version, so you likely have an old version either in (1) a Python virtual environment or (2) your global system Python install and are accidentally using the older version.

  15. Daly February 27, 2017 at 12:49 pm #

    the code works fine while tested on the image that comes along with it, when I try using other image, error showed, using an image with same type but not the same dimensions the error is ‘NoneType’ object has no attribute ‘reshape’
    and when using an image with same type and dimensions, this is showed “key=lambda b: b[1][i], reverse=reverse))
    ValueError: need more than 0 values to unpack

    • Adrian Rosebrock March 2, 2017 at 7:00 am #

      This blog post was built around a specific example (the thermostat image at the top of the post). It’s unlikely to work out-of-the-box with other images. You will likely need to debug the script and ensure that you can find the LCD screen followed by the digits on the screen.

      • TT May 9, 2017 at 7:49 pm #

        Could you give instructions on how to do digit recognition in a real-time webcam? I’ve met the same problems, and it’s hard to do

        • Adrian Rosebrock May 11, 2017 at 8:50 am #

          The same techniques you apply to a single image can be applied to a video stream as well. Remember that a video stream is just a collection of single images.

  16. Antonios Kats March 6, 2017 at 3:12 am #

    Hi Adrian,
    I would like to ask you if it’s possible instead of cv2.something( ),
    to text straight something( ).
    Is it the namespace command , so to not need to type ( cv2. )every single time ?

    It’s an amazing example. It works perfect. Hope for more in the future 😉
    K.R.
    Antonios

    • Adrian Rosebrock March 6, 2017 at 3:37 pm #

      I DO NOT recommend you do this as namespacing is extremely important to tidy, maintainable code. That said, this will get you what you want:

      from cv2 import *

  17. suhas March 24, 2017 at 7:59 am #

    Can you post a code on Recognizing alphabets with OpenCV and python?

  18. mohsen April 6, 2017 at 12:46 pm #

    hi
    i have one error :

    The function is not implemented. Rebuild the library with Windows, GTK+ 2.x

    • Adrian Rosebrock April 8, 2017 at 12:55 pm #

      This sounds like an error related to the “highgui” module of OpenCV. Re-compile and re-install OpenCV following one of my tutorials.

  19. Ravenjam April 25, 2017 at 2:43 pm #

    Hey man I really like your posts. Can you do one for opencv 3 and python on face recognition (not detection)? I’m having trouble finding a workable example online. Thanks!

  20. hgfav May 27, 2017 at 9:13 pm #

    I get this error while trying to compile : “ImportError: No module named cv2”. Then I try to install this module using pip, it doesn’t exist.

  21. Phuong June 1, 2017 at 1:46 am #

    Hi, Adrian I have problems in using open cv and python. Because it is the identity on the lcd if it is identified on a numeric table is how. When running your program all the things I did not break is due to code I just break it when using open cv with python. What are you using python and opencv? love you
    and you give code because i not dowload guide of you ?

    • Adrian Rosebrock June 4, 2017 at 6:27 am #

      I used OpenCV 3 and Python 2.7 for this blog post. It also works with OpenCV 2.4 and Python 2.7 along with OpenCV 3 and Python 3. If you are getting an error with the code, please share it. Without the error myself and other readers cannot help you.

  22. shruthi June 14, 2017 at 6:22 pm #

    Hi Adrain,

    Thanks for one more awesome tutorial.
    I would like to know whether this can be used for speed limit board recognition. I mean to read numbers on road side speed limit.

    Thanks

    • Adrian Rosebrock June 16, 2017 at 11:25 am #

      Are you referring to the LED segment display boards? Or the actual signs? For the LED segment boards, this approach would likely work. If you want to use this approach actual signs I would train a custom object detector to first detect the sign, then extract the digits, followed by classifying them.

Leave a Reply