In a previous tutorial, we implemented our very first OCR project. We saw that Tesseract worked well on some images but returned total nonsense for other examples. Part of being a successful OCR practitioner is learning that when you see this garbled, nonsensical output from Tesseract, it means some combination of (1) your image pre-processing techniques and (2) your Tesseract OCR options are incorrect.
To learn how to detect and OCR digits with Tesseract and Python, just keep reading.
Detecting and OCR’ing Digits with Tesseract and Python
Tesseract is a tool, like any other software package. Just like a data scientist can’t simply import millions of customer purchase records into Microsoft Excel and expect Excel to recognize purchase patterns automatically, it’s unrealistic to expect Tesseract to figure out what you need to OCR automatically and correctly output it.
Instead, it would help if you learned how to configure Tesseract properly for the task at hand. For example, suppose you are tasked with creating a computer vision application to automatically OCR business cards for their phone numbers.
How would you go about building such a project? Would you try to OCR the entire business card and then use a combination of regular expressions and post-processing pattern recognition to parse out the digits?
Or would you take a step back and examine the Tesseract OCR engine itself — is it possible to tell Tesseract to only OCR digits?
It turns out there is. And that’s what we’ll cover in this tutorial.
In this tutorial, you will:
- Gain hands-on experience OCR’ing digits from input images
- Extend our previous OCR script to handle digit recognition
- Learn how to configure Tesseract to only OCR digits
- Pass in this configuration to Tesseract via the
Configuring your development environment
To follow this guide, you need to have the OpenCV library installed on your system.
Luckily, OpenCV is pip-installable:
$ pip install opencv-contrib-python
If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.
Having problems configuring your development environment?
All that said, are you:
- Short on time?
- Learning on your employer’s administratively locked system?
- Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
- Ready to run the code right now on your Windows, macOS, or Linux system?
Then join PyImageSearch University today!
Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.
And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!
Digit Detection and Recognition with Tesseract
In the first part of this tutorial, we’ll review digit detection and recognition, including real-world problems where we may wish to OCR only digits.
From there, we’ll review our project directory structure, and I’ll show you how to perform digit detection and recognition with Tesseract. We’ll wrap up this tutorial with a review of our digit OCR results.
What Is Digit Detection and Recognition?
As the name suggests, digit recognition is the process of OCR’ing and identifying only digits, purposely ignoring other characters. Digit recognition is often applied to real-world OCR projects (a montage of which can be seen in Figure 2), including:
- Extracting information from business cards
- Building an intelligent water monitor reader
- Bank check and credit card OCR
Our goal here is to “cut through the noise” of the non-digit characters. We will instead “laser in” on the digits. Luckily, accomplishing this digit recognition task is relatively easy once we supply the correct parameters to Tesseract.
Let’s review the directory structure for this project:
|-- apple_support.png |-- ocr_digits.py
Our project consists of one testing image (
apple_support.png) and our
ocr_digits.py Python script. The script accepts an image and an optional “digits only” setting and reports OCR results accordingly.
OCR’ing Digits with Tesseract and OpenCV
We are now ready to OCR digits with Tesseract. Open a new file, name it
ocr_digits.py, and insert the following code:
# import the necessary packages import pytesseract import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd") ap.add_argument("-d", "--digits", type=int, default=1, help="whether or not *digits only* OCR will be performed") args = vars(ap.parse_args())
As you can see, we’re using the PyTesseract package in conjunction with OpenCV. After our imports are taken care of, we parse two command line arguments:
--image: Path to the image to be OCR’d
--digits: A flag indicating whether or not we should OCR digits only (by
default, the option is set to a
Let’s go ahead and load our image and perform OCR:
# load the input image, convert it from BGR to RGB channel ordering, # and initialize our Tesseract OCR options as an empty string image = cv2.imread(args["image"]) rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) options = "" # check to see if *digit only* OCR should be performed, and if so, # update our Tesseract OCR options if args["digits"] > 0: options = "outputbase digits" # OCR the input image using Tesseract text = pytesseract.image_to_string(rgb, config=options) print(text)
Tesseract requires RGB color channel ordering for performing OCR. Lines 16 and 17 load the input
--image and swap color channels accordingly.
We then establish our Tesseract
options (Lines 18–23). Configuring Tesseract with options allows for more granular control over Tesseract’s methods under the hood to perform OCR.
For now, our
options are either empty (Line 18) or
outputbase digits, indicating that we will only OCR digits on the input image (Lines 22 and 23).
From there, we use the
image_to_string function call while passing our
rgb image and our configuration
options (Line 26). Notice that we’re using the
config parameter and including the digits only setting if the
--digits command line argument Boolean is
Finally, we show the OCR
text results in our terminal (Line 27). Let’s see if those results meet our expectations.
Digit OCR Results
We are now ready to OCR digits with Tesseract.
Open a terminal and execute the following command:
$ python ocr_digits.py --image apple_support.png 1-800-275-2273
As input to our
ocr_digits.py script, we’ve supplied a sample business card-like image that contains the text “Apple Support,” along with the corresponding phone number (Figure 3). Our script can correctly OCR the phone number, displaying it to our terminal while ignoring the “Apple Support” text.
One of the problems of using Tesseract via the command line or with the
image_to_string function is that it becomes quite hard to debug exactly how Tesseract arrived at the final output.
Once we gain some more experience working with the Tesseract OCR engine, we’ll turn our attention to visually debugging and eventually filtering out extraneous characters via confidence/probability scores. For the time being, please pay attention to the options and configurations we’re supplying to Tesseract to accomplish our goals (i.e., digit recognition).
If you instead want to OCR all characters (not just limited to digits), you can set the
--digits command line argument to any value ≤0:
$ python ocr_digits.py --image apple_support.png --digits 0 a Apple Support 1-800-275-2273
Notice how the “Apple Support” text is now included with the phone number in the OCR Output. But what’s up with that “a” in the output? Where is that coming from?
The “a” in the output is Tesseract confusing the leaf at the top of the Apple logo as an alphabet (Figure 4).
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
In this tutorial, you learned how to configure Tesseract and
pytesseract to OCR only digits. We then used our Python script to handle OCR’ing the digits.
You’ll want to pay close attention to the
options we supply to Tesseract. Frequently, being able to apply OCR successfully to a Tesseract project depends on providing the correct set of configurations.
In our next tutorial, we’ll continue exploring Tesseract options by learning how to whitelist and blacklist a custom set of characters.
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!