The first time I ever used the Tesseract optical character recognition (OCR) engine was in my college undergraduate years.
I was taking my first course on computer vision. Our professor wanted us to research a challenging computer vision topic for our final project, extend existing research, and then write a formal paper on our work. I had trouble deciding on a project, so I went to see the professor, a Navy researcher who often worked on medical applications of computer vision and machine learning. He advised me to work on automatic prescription pill identification, the process of automatically recognizing prescription pills in an image. I considered the problem for a few moments and then replied:
Couldn’t you just OCR the imprints on the pill to recognize it?
To learn how to conduct OCR on your first project, just keep reading.
Your First OCR Project with Tesseract and Python
I still remember the look on my professor’s face.
He smiled, a small smirk appearing on the left corner of his mouth. Knowing the problems I was going to encounter, he replied with “If only it were that simple. But you’ll find out soon enough.”
I then went home and immediately started playing with the Tesseract library, reading the manual/documentation, and attempting to OCR some example images via the command line. But I found myself struggling. Some images were being OCR’d correctly, while others were returning complete nonsense.
Why was OCR so hard? And why was I struggling so much?
I spent the evening, staying up late into the night, continuing to test Tesseract with various images — for the life of me, I couldn’t discern the pattern between images that Tesseract could correctly OCR versus the ones it could fail on. What black magic was going on here?!
Unfortunately, this is the same feeling I see many computer vision practitioners having when first starting to learn OCR — perhaps you have even felt it yourself:
- You install Tesseract on your machine
- You follow a few basic examples on a tutorial you found via a Google search
- The examples return the correct results
- … but when you apply the same OCR technique to your images, you get incorrect results back
The problem is that these tutorials don’t teach OCR systematically. They’ll show you the how, but they won’t show you the why — that critical piece of information that allows you to discern patterns in OCR problems, allowing you to solve them correctly.
In this tutorial, you’ll be building your very first OCR project. It will serve as the “bare bones” Python script you need to perform OCR. In future posts, we’ll build on what you learn here.
By the end of this tutorial, you’ll be confident in your ability to apply OCR to your projects.
Let’s get started.
In this tutorial, you will:
- Gain hands-on experience using Tesseract to OCR an image
- Learn how to import the
pytesseractpackage into your Python scripts
- Use OpenCV to load an input image from disk
- Pass the image into the Tesseract OCR engine via the
- Display the OCR’d text results on our terminal
Configuring your development environment
To follow this guide, you need to have the OpenCV library installed on your system.
Luckily, OpenCV is pip-installable:
$ pip install opencv-contrib-python
If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.
Having problems configuring your development environment?
All that said, are you:
- Short on time?
- Learning on your employer’s administratively locked system?
- Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
- Ready to run the code right now on your Windows, macOS, or Linux system?
Then join PyImageSearch University today!
Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.
And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!
Getting Started with Tesseract
In the first part of this tutorial, we’ll review our directory structure for this project. From there, we’ll implement a simple Python script that will:
- Load an input image from disk via OpenCV
- OCR the image via Tesseract and
- Display the OCR’d text on our screen
We’ll wrap up the tutorial with a discussion of the OCR’d text results.
|-- pyimagesearch_address.png |-- steve_jobs.png |-- whole_foods.png |-- first_ocr.py
Our first project is very straightforward in the way it is organized. Inside the tutorial’s code directory, you’ll find three example PNG images for OCR testing and a single Python script named
Let’s dive right into our Python script in the next section.
Basic OCR with Tesseract
Let’s get started with your very first Tesseract OCR project! Open a new file, name it
first_ocr.py, and insert the following code:
# import the necessary packages import pytesseract import argparse import cv2 # construct the argument parser and parse the arguments} ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd") args = vars(ap.parse_args())
The first Python
import you’ll notice in this script is
pytesseract (Python Tesseract), a Python binding that ties in directly with the Tesseract OCR application running on your system. The power of
pytesseract is our ability to interface with Tesseract rather than relying on ugly
os.cmd calls as we needed to do before
pytesseract ever existed. Thanks to its power and ease of use, we’ll use
pytesseract in this and future tutorials!
Our script requires a single command line argument using Python’s
argparse interface. By providing the
--image argument and image file path value directly in your terminal when you execute this example script, Python will dynamically load an image of your choosing. I’ve provided three example images in the project directory for this tutorial that you can use. I also highly encourage you to try using Tesseract via this Python example script to OCR your images!
Now that we’ve handled our imports and lone command line argument, let’s get to the fun part — OCR with Python:
# load the input image and convert it from BGR to RGB channel # ordering} image = cv2.imread(args["image"]) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # use Tesseract to OCR the image text = pytesseract.image_to_string(image) print(text)
Here, Lines 14 and 15 load our input
--image from disk and swap color channel ordering. Tesseract expects RGB-format images; however, OpenCV loads images in BGR order. This isn’t a problem because we can fix it using OpenCV’s
cv2.cvtColor call — just be especially careful to know when to use RGB (Red Green Blue) vs. BGR (Blue Green Red).
Remark 1. I’d also like to point out that many times when you see Tesseract examples online, they will use
pillow to load an image. Those packages load images in RGB format, so a conversion step is not required.
Finally, Line 18 performs OCR on our input RGB
image and returns the results as a string stored in the
text is now a string, we can pass it onto Python’s built-in
text result on a copy of the input
--image using OpenCV, and display it on your screen).
Wait, for real?
Oh yeah, if you didn’t notice, OCR with PyTesseract is as easy as a single function call, provided you’ve loaded the image in proper RGB order. So now, let’s check the results and see if they meet our expectations.
Tesseract OCR Results
Let’s put our newly implemented Tesseract OCR script to the test. Open your terminal, and execute the following command:
$ python first_ocr.py --image pyimagesearch_address.png PyImageSearch PO Box 17598 #17900 Baltimore, MD 21297
In Figure 2, you can see our input image, which contains the address for PyImageSearch on a gray, slightly textured background. As the command and terminal output indicate, both Tesseract and
pytesseract correctly, OCR’d the text.
Let’s try another image, this one of Steve Jobs’ old business card:
$ python first_ocr.py --image steve_jobs.png Steven P. Jobs Chairman of the Board Apple Computer, Inc. 20525 Mariani Avenue, MS: 3K Cupertino, California 95014 408 973-2121 or 996-1010.
Steve Jobs’ business card in Figure 3 is correctly OCR’d even though the input image is posing several difficulties common to OCR’ing scanned documents, including:
- Yellowing of the paper due to age
- Noise on the image, including speckling
- Text that is starting to fade
Despite all these challenges, Tesseract was able to correctly OCR the business card. But that begs the question — is OCR this simple? Do we just open a Python shell, import the
pytesseract package, and then call
image_to_string on an input image? Unfortunately, OCR isn’t that simple (if it were, this tutorial would be unnecessary). As an example, let’s apply our same
first_ocr.py script to a more challenging photo of a Whole Food’s receipt:
$ python first_ocr.py --image whole_foods.png aie WESTPORT CT 06880 yHOLE FOODS MARKE 399 post RD WEST ~ ; 903) 227-6858 BACON LS NP 365 pacon LS N
The Whole Foods grocery store receipt in Figure 4 was not OCR’d correctly using Tesseract. You can see that Tesseract has to spit out a bunch of garbled nonsense. OCR isn’t always perfect.
What's next? I recommend PyImageSearch University.
25 total classes • 37h 19m video • Last updated: 9/2021
★★★★★ 4.84 (128 Ratings) • 10,597 Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 25 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 25 Certificates of Completion
- ✓ 37h 19m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 400+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
In this tutorial, you created your very first OCR project using the Tesseract OCR engine, the
pytesseract package (used to interact with the Tesseract OCR engine), and the OpenCV library (used to load an input image from disk).
We then applied our basic OCR script to three example images. Our basic OCR script worked for the first two but struggled tremendously for the final one. So what gives? Why was Tesseract able to OCR the first two examples perfectly but then utterly fail on the third image? The secret lies in the image pre-processing steps, along with the underlying Tesseract modes and options.
Congrats on completing today’s tutorial, well done!
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!