OpenCV Text Detection (EAST text detector)

Click here to download the source code to this post.

In this tutorial you will learn how to use OpenCV to detect text in natural scene images using the EAST text detector.

OpenCV’s EAST text detector is a deep learning model, based on a novel architecture and training pattern. It is capable of (1) running at near real-time at 13 FPS on 720p images and (2) obtains state-of-the-art text detection accuracy.

In the remainder of this tutorial you will learn how to use OpenCV’s EAST detector to automatically detect text in both images and video streams.

To discover how to apply text detection with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV Text Detection (EAST text detector)

In this tutorial, you will learn how to use OpenCV to detect text in images using the EAST text detector.

The EAST text detector requires that we are running OpenCV 3.4.2 or OpenCV 4 on our systems — if you do not already have OpenCV 3.4.2 or better installed, please refer to my OpenCV install guides and follow the one for your respective operating system.

In the first part of today’s tutorial, I’ll discuss why detecting text in natural scene images can be so challenging.

From there I’ll briefly discuss the EAST text detector, why we use it, and what makes the algorithm so novel — I’ll also include links to the original paper so you can read up on the details if you are so inclined.

Finally, I’ll provide my Python + OpenCV text detection implementation so you can start applying text detection in your own applications.

Why is natural scene text detection so challenging?

Figure 1: Examples of natural scene images where text detection is challenging due to lighting conditions, image quality, and non-planar objects (Figure 1 of Mancas-Thillou and Gosselin).

Detecting text in constrained, controlled environments can typically be accomplished by using heuristic-based approaches, such as exploiting gradient information or the fact that text is typically grouped into paragraphs and characters appear on a straight line. An example of such a heuristic-based text detector can be seen in my previous blog post on Detecting machine-readable zones in passport images.

Natural scene text detection is different though — and much more challenging.

Due to the proliferation of cheap digital cameras, and not to mention the fact that nearly every smartphone now has a camera, we need to be highly concerned with the conditions the image was captured under — and furthermore, what assumptions we can and cannot make. I’ve included a summarized version of the natural scene text detection challenges described by Celine Mancas-Thillou and Bernard Gosselin in their excellent 2017

paper, Natural Scene Text Understanding below:

  • Image/sensor noise: Sensor noise from a handheld camera is typically higher than that of a traditional scanner. Additionally, low-priced cameras will typically interpolate the pixels of raw sensors to produce real colors.
  • Viewing angles: Natural scene text can naturally have viewing angles that are not parallel to the text, making the text harder to recognize.
  • Blurring: Uncontrolled environments tend to have blur, especially if the end user is utilizing a smartphone that does not have some form of stabilization.
  • Lighting conditions: We cannot make any assumptions regarding our lighting conditions in natural scene images. It may be near dark, the flash on the camera may be on, or the sun may be shining brightly, saturating the entire image.
  • Resolution: Not all cameras are created equal — we may be dealing with cameras with sub-par resolution.
  • Non-paper objects: Most, but not all, paper is not reflective (at least in context of paper you are trying to scan). Text in natural scenes may be reflective, including logos, signs, etc.
  • Non-planar objects: Consider what happens when you wrap text around a bottle — the text on the surface becomes distorted and deformed. While humans may still be able to easily “detect” and read the text, our algorithms will struggle. We need to be able to handle such use cases.
  • Unknown layout: We cannot use any a priori information to give our algorithms “clues” as to where the text resides.

As we’ll learn, OpenCV’s text detector implementation of EAST is quite robust, capable of localizing text even when it’s blurred, reflective, or partially obscured:

Figure 2: OpenCV’s EAST scene text detector will detect even in blurry and obscured images.

I would suggest reading Mancas-Thillou and Gosselin’s work if you are further interested in the challenges associated with text detection in natural scene images.

The EAST deep learning text detector

Figure 3: The structure of the EAST text detection Fully-Convolutional Network (Figure 3 of Zhou et al.).

With the release of OpenCV 3.4.2 and OpenCV 4, we can now use a deep learning-based text detector called EAST, which is based on Zhou et al.’s 2017

paper, EAST: An Efficient and Accurate Scene Text Detector.

We call the algorithm “EAST” because it’s an: Efficient and Accurate Scene Text detection pipeline.

The EAST pipeline is capable of predicting words and lines of text at arbitrary orientations on 720p images, and furthermore, can run at 13 FPS, according to the authors.

Perhaps most importantly, since the deep learning model is end-to-end, it is possible to sidestep computationally expensive sub-algorithms that other text detectors typically apply, including candidate aggregation and word partitioning.

To build and train such a deep learning model, the EAST method utilizes novel, carefully designed loss functions.

For more details on EAST, including architecture design and training methods, be sure to refer to the publication by the authors.

Project structure

To start, be sure to grab the source code + images to today’s post by visiting the “Downloads” section. From there, simply use the tree  terminal command to view the project structure:

Notice that I’ve provided three sample pictures in the images/  directory. You may wish to add your own images collected with your smartphone or ones you find online.

We’ll be reviewing two .py  files today:

  • : Detects text in static images.
  • : Detects text via your webcam or input video files.

Both scripts make use of the serialized EAST model ( frozen_east_text_detection.pb ) provided for your convenience in the “Downloads”.

Implementation notes

The text detection implementation I am including today is based on OpenCV’s official C++ example; however, I must admit that I had a bit of trouble when converting it to Python.

To start, there are no Point2f  and RotatedRect  functions in Python, and because of this, I could not 100% mimic the C++ implementation. The C++ implementation can produce rotated bounding boxes, but unfortunately the one I am sharing with you today cannot.

Secondly, the NMSBoxes  function does not return any values for the Python bindings (at least for my OpenCV 4 pre-release install), ultimately resulting in OpenCV throwing an error. The NMSBoxes  function may work in OpenCV 3.4.2 but I wasn’t able to exhaustively test it.

I got around this issue my using my own non-maxima suppression implementation in imutils, but again, I don’t believe these two are 100% interchangeable as it appears NMSBoxes  accepts additional parameters.

Given all that, I’ve tried my best to provide you with the best OpenCV text detection implementation I could, using the working functions and resources I had. If you have any improvements to the method please do feel free to share them in the comments below.

Implementing our text detector with OpenCV

Before we get started, I want to point out that you will need at least OpenCV 3.4.2 (or OpenCV 4) installed on your system to utilize OpenCV’s EAST text detector, so if you haven’t already installed OpenCV 3.4.2 or better on your system, please refer to my OpenCV install guides.

Next, make sure you have imutils  installed/upgraded on your system as well:

At this point your system is now configured, so open up  and insert the following code:

To begin, we import our required packages and modules on Lines 2-6. Notably we import NumPy, OpenCV, and my implementation of  non_max_suppression  from imutils.object_detection .

We then proceed to parse five command line arguments on Lines 9-20:

  • --image : The path to our input image.
  • --east : The EAST scene text detector model file path.
  • --min-confidence : Probability threshold to determine text. Optional with default=0.5 .
  • --width : Resized image width — must be multiple of 32. Optional with default=320 .
  • --height : Resized image height — must be multiple of 32. Optional with default=320 .

Important: The EAST text requires that your input image dimensions be multiples of 32, so if you choose to adjust your --width  and --height  values, make sure they are multiples of 32!

From there, let’s load our image and resize it:

On Lines 23 and 24, we load and copy our input image.

From there, Lines 30 and 31 determine the ratio of the original image dimensions to new image dimensions (based on the command line argument provided for --width  and --height ).

Then we resize the image, ignoring aspect ratio (Line 34).

In order to perform text detection using OpenCV and the EAST deep learning model, we need to extract the output feature maps of two layers:

We construct a list of layerNames  on Lines 40-42:

  1. The first layer is our output sigmoid activation which gives us the probability of a region containing text or not.
  2. The second layer is the output feature map that represents the “geometry” of the image — we’ll be able to use this geometry to derive the bounding box coordinates of the text in the input image

Let’s load the OpenCV’s EAST text detector:

We load the neural network into memory using cv2.dnn.readNet  by passing the path to the EAST detector (contained in our command line args  dictionary) as a parameter on Line 46.

Then we prepare our image by converting it to a blob  on Lines 50 and 51. To read more about this step, refer to Deep learning: How OpenCV’s blobFromImage works.

To predict text we can simply set the blob  as input and call net.forward  (Lines 53 and 54). These lines are surrounded by grabbing timestamps so that we can print  the elapsed time on Line 58.

By supplying layerNames  as a parameter to net.forward , we are instructing OpenCV to return the two feature maps that we are interested in:

  • The output geometry  map used to derive the bounding box coordinates of text in our input images
  • And similarly, the scores  map, containing the probability of a given region containing text

We’ll need to loop over each of these values, one-by-one:

We start off by grabbing the dimensions of the scores  volume (Line 63) and then initializing two lists:

  • rects : Stores the bounding box (x, y)-coordinates for text regions
  • confidences : Stores the probability associated with each of the bounding boxes in rects

We’ll later be applying non-maxima suppression to these regions.

Looping over the rows begins on Line 68.

Lines 72-77 extract our scores and geometry data for the current row,  y .

Next, we loop over each of the column indexes for our currently selected row:

For every row, we begin looping over the columns on Line 80.

We need to filter out weak text detections by ignoring areas that do not have sufficiently high probability (Lines 82 and 83).

The EAST text detector naturally reduces volume size as the image passes through the network — our volume size is actually 4x smaller than our input image so we multiply by four to bring the coordinates back into respect of our original image.

I’ve included how you can extract the angle  data on Lines 91-93; however, as I mentioned in the previous section, I wasn’t able to construct a rotated bounding box from it as is performed in the C++ implementation — if you feel like tackling the task, starting with the angle on Line 91 would be your first step.

From there, Lines 97-105 derive the bounding box coordinates for the text area.

We then update our rects  and confidences  lists, respectively (Lines 109 and 110).

We’re almost finished!

The final step is to apply non-maxima suppression to our bounding boxes to suppress weak overlapping bounding boxes and then display the resulting text predictions:

As I mentioned in the previous section, I could not use the non-maxima suppression in my OpenCV 4 install ( cv2.dnn.NMSBoxes ) as the Python bindings did not return a value, ultimately causing OpenCV to error out. I wasn’t fully able to test in OpenCV 3.4.2 so it may work in v3.4.2.

Instead, I have used my non-maxima suppression implementation available in the imutils  package (Line 114). The results still look good; however, I wasn’t able to compare my output to the NMSBoxes  function to see if they were identical.

Lines 117-126 loop over our bounding boxes , scale the coordinates back to the original image dimensions, and draw the output to our orig  image. The orig  image is displayed until a key is pressed (Lines 129 and 130).

As a final implementation note I would like to mention that our two nested for  loops used to loop over the scores  and geometry  volumes on Lines 68-110 would be an excellent example of where you could leverage Cython to dramatically speed up your pipeline. I’ve demonstrated the power of Cython in Fast, optimized ‘for’ pixel loops with OpenCV and Python.

OpenCV text detection results

Are you ready to apply text detection to images?

Start by grabbing the “Downloads” for this blog post and unzip the files.

From there, you may execute the following command in your terminal (taking note of the two command line arguments):

Your results should look similar to the following image:

Figure 4: Famous basketball player, Lebron James’ jersey text is successfully recognized with OpenCV and EAST text detection.

Three text regions are identified on Lebron James.

Now let’s try to detect text of a business sign:

Figure 5: Text is easily recognized with Python and OpenCV using EAST in this natural scene of a car wash station.

And finally, we’ll try a road sign:

Figure 6: Scene text detection with Python + OpenCV and the EAST text detector successfully detects the text on this Spanish stop sign.

This scene contains a Spanish stop sign. The word, “ALTO” is correctly detected by OpenCV and EAST.

As you can tell, EAST is quite accurate and relatively fast taking approximately 0.14 seconds on average per image.

Text detection in video with OpenCV

Now that we’ve seen how to detect text in images, let’s move on to detecting text in video with OpenCV.

This explanation will be very brief; please refer to the previous section for details as needed.

Open up  and insert the following code:

We begin by importing our packages. We’ll be using VideoStream  to access a webcam and FPS  to benchmark our frames per second for this script. Everything else is the same as in the previous section.

For convenience, let’s define a new function to decode our predictions function — it will be reused for each frame and make our loop cleaner:

On Line 11 we define decode_predictions  function. This function is used to extract:

  1. The bounding box coordinates of a text region
  2. And the probability of a text region detection

This dedicated function will make the code easier to read and manage later on in this script.

Let’s parse our command line arguments:

Our command line arguments are parsed on Lines 69-80:

  • --east : The EAST scene text detector model file path.
  • --video : The path to our input video. Optional — if a video path is provided then the webcam will not be used.
  • --min-confidence : Probability threshold to determine text. Optional with default=0.5 .
  • --width : Resized image width (must be multiple of 32). Optional with default=320 .
  • --height : Resized image height (must be multiple of 32). Optional with default=320 .

The primary change from the image-only script in the previous section (in terms of command line arguments) is that I’ve substituted the --image  argument with --video .

Important: The EAST text requires that your input image dimensions be multiples of 32, so if you choose to adjust your --width  and --height  values, ensure they are multiples of 32!

Next, we’ll perform important initializations which mimic the previous script:

The height/width and ratio initializations on Lines 84-86 will allow us to properly scale our bounding boxes later on.

Our output layer names are defined and we load our pre-trained EAST text detector on Lines 91-97.

The following block sets up our video stream and frames per second counter:

Our video stream is set up for either:

  • A webcam (Lines 100-103)
  • Or a video file (Lines 106-107)

From there we initialize our frames per second counter on Line 110 and begin looping over incoming frames:

We begin looping over video/webcam frames on Line 113.

Our frame is resized, maintaining aspect ratio (Line 124). From there, we grab dimensions and compute the scaling ratios (Lines 129-132). We then resize the frame again (must be a multiple of 32), this time ignoring aspect ratio since we have stored the ratios for safe keeping (Line 135).

Inference and drawing text region bounding boxes take place on the following lines:

In this block we:

  • Detect text regions using EAST via creating a blob  and passing it through the network (Lines 139-142)
  • Decode the predictions and apply NMS (Lines 146 and 147). We use the decode_predictions  function defined previously in this script and my imutils non_max_suppression  convenience function.
  • Loop over bounding boxes and draw them on the frame  (Lines 150-159). This involves scaling the boxes by the ratios gathered earlier.

From there we’ll close out the frame processing loop as well as the script itself:

We update our fps  counter each iteration of the loop (Line 162) so that timings can be calculated and displayed (Lines 173-175) when we break out of the loop.

We show the output of EAST text detection on Line 165 and handle keypresses (Lines 166-170). If “q” is pressed for “quit”, we break  out of the loop and proceed to clean up and release pointers.

Video text detection results

To apply text detection to video with OpenCV, be sure to use the “Downloads” section of this blog post.

From there, open up a terminal and execute the following command (which will fire up your webcam since we aren’t supplying a --video  via command line argument):

Our OpenCV text detection video script achieves 7-9 FPS.

This result is not quite as fast as the authors reported (13 FPS); however, we are using Python instead of C++. By optimizing our for loops with Cython, we should be able to increase the speed of our text detection pipeline.


In today’s blog post, we learned how to use OpenCV’s new EAST text detector to automatically detect the presence of text in natural scene images.

The text detector is not only accurate, but it’s capable of running in near real-time at approximately 13 FPS on 720p images.

In order to provide an implementation of OpenCV’s EAST text detector, I needed to convert OpenCV’s C++ example; however, there were a number of challenges I encountered, such as:

  1. Not being able to use OpenCV’s NMSBoxes  for non-maxima suppression and instead having to use my implementation from imutils .
  2. Not being able to compute a true rotated bounding box due to the lack of Python bindings for RotatedRect .

I tried to keep my implementation as close to OpenCV’s as possible, but keep in mind that my version is not 100% identical to the C++ version and that there may be one or two small problems that will need to be resolved over time.

In any case, I hope you enjoyed today’s tutorial on text detection with OpenCV!

To download the source code to this tutorial, and start applying text detection to your own images, just enter your email address in the form below.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , ,

133 Responses to OpenCV Text Detection (EAST text detector)

  1. Adam August 20, 2018 at 11:32 am #

    Oh man, great article Adrian. Thanks for sharing with the rest of the world.

    I just have a toy project for text detection. The only caveat that my text might be in English or Arabic, so I will see if this can somehow help me out!


    • Adrian Rosebrock August 20, 2018 at 2:38 pm #

      I haven’t personally tried with non-English words but a PyImageSearch reader on LinkedIn posted an example of correctly detecting Tamil text. It may work for your project as well, be sure to give it a try!

      • Pavlin B August 25, 2018 at 2:30 pm #

        I tried on mixed Bulgarian/English – it works perfect.

        One question – how to extract boundered text?


        • Adrian Rosebrock August 30, 2018 at 9:34 am #

          You can use array slicing:

          roi = image[startY:endY, startX:endX]

  2. Miguel August 20, 2018 at 11:33 am #

    Very nice tutorial. I just didn’t get how, after getting the bounding boxes, how to actually get the detected text

    • Adrian Rosebrock August 20, 2018 at 2:37 pm #

      Be sure to see my reply to FUXIN YU 🙂

  3. Patrick August 20, 2018 at 11:48 am #

    Correction: the EAST paper is from 2017, not 2007. I was really surprised to see a 2007 paper to have a RPN-like CNN structure. 😉

    • Adrian Rosebrock August 20, 2018 at 2:39 pm #

      That was indeed a typo on my part, thank you for pointing it out! I’ve corrected the post.

  4. FUXIN YU August 20, 2018 at 12:35 pm #

    Hi Author,
    Thanks for your posting, this is really good material to learn ML and CV.
    one question, how to get the text content which has been recognized in the box?


    • Adrian Rosebrock August 20, 2018 at 2:36 pm #

      Once you have the ROI of the text area you could pass it into an algorithm that that is dedicated to performing Optical Character Recognition (OCR). I’ll be posting a separate guide that demonstrates how to combine the text detection with the text recognition phase, but for the time being you should refer to this guide on Tesseract OCR.

      • Markus Dieterle August 21, 2018 at 8:30 am #

        Hello Adrian,

        Great post! As far as the text extraction goes I think we should take into concideration what was already written in the “Natural Scene Text Understanding” paper.
        Basically, even if the text areas are properly located, you should do some image processing taking into account variations in lighting, color, hue, saturation, light reflection etc. Once the extracted pieces of the image have been cleaned up, OCR should work more reliably.
        Though I’m not sure if an additional, well trained neural network would not even be better – that would offer more options for retraining for different charcter sets and languages…

      • Gaurav A August 22, 2018 at 4:19 am #

        Hi Adrian,

        First of all thanks a lot for posting this brilliant article.It helps a lot.
        Also when can we expect to get article on how to combine text detection with text recognition.
        Need it a bit urgently 🙁

        • Adrian Rosebrock August 22, 2018 at 9:21 am #

          I’m honestly not sure, Gaurav. I have some other posts I’m working on and then I’ll be swinging back to text recognition. Likely not for another few weeks/months.

          • Gaurav A August 24, 2018 at 1:29 am #

            Thanks for the update Adrian. But can you guide me some path may be some links/post to refer on how to do text recognition after text detection. It would be really helpful.

      • Joan August 23, 2018 at 6:12 am #

        Hi Adrian,

        If you will be making a guide for OCR this dataset may interest you:
        It contains images of gas counters with all the annotations (coordinates of boxes and digits). I trained a model with that dataset and it performed really well even with different fonts. If you happen to know a similar dataset please tell me, thanks and great post!

        • Adrian Rosebrock August 23, 2018 at 6:49 am #

          Wow, this is a really, really cool dataset — thank you for sharing, Joan! What type of model did you train on the data? I see they have annotations for both segmentation of the meter followed by the detection of the digits.

          • Joan September 2, 2018 at 11:14 am #

            Used a HOG to extract the features and passed it to a SVM

          • Adrian Rosebrock September 5, 2018 at 9:04 am #

            Awesome! I’ll look into this further.

  5. enes polat August 20, 2018 at 3:57 pm #

    hi thanks for your tutorial
    I am using Anaconda3
    how can I import imutils to my Anaconda3

    • Adrian Rosebrock August 21, 2018 at 6:46 am #

      I’m not an Anaconda user but you should be able to pip install it once you’ve created an environment:

      $ pip install imutils

      Additionally, this thread on GitHub documents users who had trouble installing imutils for one reason or another. Be sure to give it a read.

  6. wh August 20, 2018 at 4:30 pm #

    13fps on what hardware RPi? Tegra?

    • Adrian Rosebrock August 21, 2018 at 6:44 am #

      The authors reported 13 FPS on a standard laptop/desktop. The benchmark was not on the Pi.

  7. farshad August 20, 2018 at 11:48 pm #

    Great work again Adrian. thanks a lot. I recently noticed that Opencv in version 3.4.2 support one of the best and most accurate tensorflow models: Faster rcnn inception v2 in object detection. In some recent posts of your blog you used caffe model in opencv. Could on please make a post on implementation of faster rcnn inception v2 on Opencv?

    • Adrian Rosebrock August 21, 2018 at 1:45 pm #

      Thank you for the suggestion Farshad, I will try to do a post on Faster R-CNNs.

      • kaisar khatak September 24, 2018 at 1:46 pm #

        Cool post. Does this method also work on vertical text???

  8. Joppu August 21, 2018 at 12:51 am #

    Nice! Couldn’t have read this at a better time. Thanks alot! Also nice guitar man \m/

    I’ve been recently searching for a good scene text detection/recognition implementation for a little project of mine. Thinking of somehow using TextBoxes++ ( but now can try out EAST.

    • Adrian Rosebrock August 21, 2018 at 6:46 am #

      Awesome! Definitely try EAST and let me know how it goes, Joppu!

  9. Ronrick August 21, 2018 at 3:00 am #

    Hi Adrain. As I tried to run the codes, I got the error:
    AttributeError: module ‘cv2.dnn’ has no attribute ‘readNet’

    Checking for the solution online, the function is not available in python using this reference.

    P.S. I have installed the latest version of opencv
    Thoughts on this one?

  10. Deepayan August 21, 2018 at 4:05 am #

    Great post-Adrian. I myself was trying to tweak f-RCNN for text detection on Sanskrit document images, but the results were far from satisfactory. I’ll try this out. Thanks a lot 🙂

    • Adrian Rosebrock August 21, 2018 at 6:48 am #

      I hope it helps with your text detection project, Deepayan! Let me know how it goes.

  11. Deni August 21, 2018 at 5:22 am #

    another great & update article :), but the resulting bounding box doesn’t rotate when the text is rotated? or I miss something?

    • Adrian Rosebrock August 21, 2018 at 6:49 am #

      Yes. To quote the post:

      “To start, there are no Point2f and RotatedRect functions in Python, and because of this, I could not 100% mimic the C++ implementation. The C++ implementation can produce rotated bounding boxes, but unfortunately the one I am sharing with you today cannot.”

      And secondly:

      “I’ve included how you can extract the angle data on Lines 91-93; however, as I mentioned in the previous section, I wasn’t able to construct a rotated bounding box from it as is performed in the C++ implementation — if you feel like tackling the task, starting with the angle on Line 91 would be your first step.”

      The conclusion also mentions this behavior as well. Please feel free to work with the code, I’ve love to have a rotated bounding box version as well!

  12. Big Adam August 21, 2018 at 5:44 am #

    Hi, Adrian

    Thanks for the sharing,the script works well.Could you please explain more about the lines in function decode_predictions especially the computation of bounding box?

  13. Danny August 21, 2018 at 10:45 am #

    Hi Adrian,

    Thank you for this sharing. In addition, could you please let me know whether we can use this EAST text detector to recognize other languages like Spanish, Korea, Mandarin and so on?

    • Adrian Rosebrock August 21, 2018 at 1:45 pm #

      Hey Danny, you should see my reply to Adam, the very first commenter on the post. I haven’t tried with non-English words but a PyImageSearch reader was able to detect Tamil text so I imagine it will work for other texts as well. You should download some images with Spanish, Korean, Mandarin, etc. and give it a try!

  14. Tom August 21, 2018 at 3:01 pm #

    Hi Adrian

    I noticed that the bounding box on rotated text wasn’t quite enclosing all of the text. I’ve calculated a more accurate bounding box by replacing lines 102-109 in with the following


    • Adrian Rosebrock August 22, 2018 at 9:26 am #

      Thank you for sharing, Tom! I’m going to test this out as well and if it works, likely update the blog post 🙂

    • Tobi October 18, 2018 at 3:14 am #

      Hi Tom,

      thanks for sharing your code. I compared it to Adrians version and need to state that your coordinates in fact are a bit more precise (at least for my use case –> text detection from scanned pdf).
      Therefore, thanks a ton.

      Best regards,

  15. Hakan Gultekin August 22, 2018 at 3:37 am #

    Hi Adrian,

    Great work. I got this to work.

    But I have one issue. Your prediction (inference) time is 0.141675 seconds. When I run it, I get 0.413854 seconds.

    I am using a Pascal GPU (p2.xlarge) on AWS cloud. Do need to configure something else for faster predictions. What are you using for running your code ?

    Thanks again.


    • Adrian Rosebrock August 22, 2018 at 9:21 am #

      I was using my iMac to run the code. You should not need any other additional optimizations provided you followed one of my OpenCV install tutorials to install OpenCV.

      • Hakan Gultekin August 22, 2018 at 7:27 pm #

        Ok great Adrian thanks !

  16. Ben August 22, 2018 at 7:27 am #

    where do I find the ‘frozen_east_text_detection.pb’ model ?

    • Adrian Rosebrock August 22, 2018 at 9:18 am #

      You can find the pre-trained model in the “Downloads” section of the blog post. Use the “Downloads” section to download the code along with the text detection model.

  17. Antonio August 22, 2018 at 9:02 am #

    Great article Adrian, incredible! Thanks a lot for your valuable tutorials! I am really looking forward to read the article about the text extraction from ROIs.

    • Adrian Rosebrock August 22, 2018 at 9:17 am #

      Thanks Antonio! I’m so happy you enjoyed the guide. I’m looking forward to writing the text recognition tutorial but it will likely be a few more weeks.

  18. mohamed August 23, 2018 at 5:59 am #

    Hi Adrian
    Wonderful progress as usual
    But I have a question please
    I want to build the model frozen_east_text_detection.pb myself. Are there some guidelines?
    thank you for your effort

    • Adrian Rosebrock August 23, 2018 at 6:09 am #

      For training instructions, you’ll want to refer to the official EAST model repo that was published by the authors of the paper.

      • mohamed August 23, 2018 at 6:29 am #

        I do not know what to say
        Thank you very much

        • Adrian Rosebrock August 23, 2018 at 6:45 am #

          Best of luck training your own model, Mohamed!

          • mohamed August 24, 2018 at 7:26 am #

            Thanks Adrian
            Good luck to you always

  19. Darshil K August 23, 2018 at 8:09 am #


    Thank you for the post and the codes!

    I am using windows 7, python 3.6
    I have openCV 3.2.0 installed in my machine. But I am not able to install openCV 3.4.2 or above. Is there any way to install it on my machine or do I have to install in virtual machine?


    • Adrian Rosebrock August 24, 2018 at 8:41 am #

      To be honest, I’m not a Windows user and I do not support Windows here on the PyImageSearch blog. I have OpenCV install tutorials for macOS, Ubuntu, and Raspbian, so if you can use one of those, please do. Otherwise, if you’re a Windows user, you’ll want to refer to the OpenCV documentation.

    • Deiner Zapata September 20, 2018 at 1:33 pm #

      Hi, I am using windows7 too, and execute this code without trouble. Adrian Rosebrock, thanks by your code, this tutorial is awesome.

      More details:
      – Python 3.6.5
      – Opencv 3.4.2
      – Windows 10

      • Adrian Rosebrock October 8, 2018 at 1:14 pm #

        Thanks Deiner 🙂

  20. Trami August 24, 2018 at 4:41 am #

    Hi, Adrian, thank you for your effort. when i run the project, i meet the problem ‘Unknown layer type Shape in op feature_fusion/Shape in function populateNet ‘. and in my computer ‘net = cv2.dnn.readNet(args[‘east’])’ should be replaced by the ‘net = cv2.dnn.readNetFromTensorflow(args[‘east’])’, i have installed the Opencv3.4.2, could tell me how to solve the problems? Thank you so much!!!

    • Adrian Rosebrock August 24, 2018 at 8:30 am #

      Hey Trami — have you tried using the cv2.dnn.readNetFromTensorflow function? Did that resolve the issue?

      • lochana September 24, 2018 at 5:40 am #

        yes using cv2.dnn.readNetFromTensorflow still working. if your using same camera for two python files which calls as sub process , the opencv versions above 3.2 the camera release function doesn’t work after i mailed to opencv they told me to install opencv 4.0-alpha but couldn’t find a way to install in my anaconda environment after searching opencv contains readNEtFromTensorflow and camera release function working

        pip install opencv-python==

        thank you

  21. lxc August 26, 2018 at 9:21 pm #

    Hi Adrian,
    Why, scores and geometry’s shape are [1 180 80] [1 5 80 80]

    • lxc August 26, 2018 at 9:25 pm #


  22. Tom August 27, 2018 at 10:52 pm #

    Hi Adrian

    I made a few mods to the code and created a few different NMS implementations that will accept rectangles, rotated rectangles or polygons as input.

    The net of the changes:

    1. Decode the EAST results
    2. Rotate the rectangles
    3. Run the rotated rectangles through NMS (Felzenswalb, Malisiewicz or FAST)
    4. Draw the NMS-selected rectangles on the original image

    The code repo is here:

    I pushed the README to medium here:


    • Adrian Rosebrock August 28, 2018 at 3:15 pm #

      This is awesome, thank you so much for sharing Tom!

      • Tom August 29, 2018 at 10:53 pm #

        My pleasure — thank you for the great post.

        I split out the nms specific stuff into a PyPi package: nms

  23. Sébastien August 28, 2018 at 9:15 am #

    Great article as always Adrian!
    I was wondering something : in your Youtube video, the words “Jaya”, “the” and “Cat” are detected separately by the algorithm. Would it be possible to modify it so that the whole textline “Jaya the Cat” is detected in a single textbox?

    • Adrian Rosebrock August 28, 2018 at 3:07 pm #

      Technically yes. For this algorithm you would compute the bounding box for all detected bounding box coordinates. From there you could extract the region as a single text box.

  24. Sébastien August 29, 2018 at 2:54 am #

    I’m not sure I understand correctly.
    In the Figure 1 of this article, the left image shows two lines: “First Eastern National” and “Bus Times”. How could your method detect that there are indeed _two_ lines with 3 words in the upper one and 2 in the other?

    • Adrian Rosebrock August 30, 2018 at 9:04 am #

      Figure 1 shows examples of images that would be very challenging for text detectors to detect. You could determine two lines based on the bounding boxes supplied by the text detector — one for the first line and a second bounding box for the second line.

      • Sébastien September 4, 2018 at 2:13 am #


  25. pankaj sharma August 30, 2018 at 9:07 am #

    Hi adrain,
    i have in run a code
    please help me to slove this problem.

    net = cv2.dnn.readNet(args[“east”])
    AttributeError: module ‘cv2.dnn’ has no attribute ‘readNet’

    • Adrian Rosebrock August 30, 2018 at 9:09 am #

      Double-check your OpenCV version. You will need at least OpenCV 3.4.1 to run this script (it sounds like you have an older version).

      • pankaj sharma August 30, 2018 at 9:17 am #

        i have 3.4.1 opencv version.
        please give me some another suggestion

        • Adrian Rosebrock August 30, 2018 at 9:45 am #

          Did you install OpenCV with the contrib module enabled? Make sure you are following one of my OpenCV install tutorials.

        • Joan September 2, 2018 at 11:51 am #

          From what I have tried you need at least opencv 3.4.2

    • Raj September 8, 2018 at 3:20 pm #

      I was getting the same error with opencv-python on windows. The issue was resolved after upgrading opencv-python to

  26. Sanda September 2, 2018 at 7:18 pm #

    I also want to recognize detected text from the do that I hope to crop the image with maximum ROI which we identified as words.then I pass this to tesseract OCR to recognize words. Can I know this method is ok to do words recognition?

    Thank You

    • Adrian Rosebrock September 5, 2018 at 9:02 am #

      I’ll be covering exactly how to do this process in a future blog post but in the meantime I always recommend experimenting. Your approach is a good one, I recommend you try it and see what types of results you get.

  27. sidis September 3, 2018 at 2:03 am #

    Hi Adrian
    is this possible to recognise the test

    • Adrian Rosebrock September 5, 2018 at 8:59 am #

      Once you’ve detected text in an image you can apply OCR. I’ll be covering the exact process in a future tutorial, stay tuned!

  28. Matheus Cunha September 4, 2018 at 1:57 pm #

    Is there any way to use the video text detecion using the Raspberry Camera V2?

    • Adrian Rosebrock September 5, 2018 at 8:34 am #

      Yes. Replace Line 102 with vs = VideoStream(usePiCamera=True).start()

  29. Dany September 5, 2018 at 5:27 am #

    Very intresting! It’s possible to convert in real text with OpenCV or I need to use OCR?

    • Adrian Rosebrock September 5, 2018 at 8:28 am #

      Once you’ve detected the text you will need to OCR it. I’ll be demonstrating how to perform such OCR in a future tutorial 🙂

      • Dany September 5, 2018 at 11:31 am #

        It’s possibile OCR with OpenCV or you work with others like tesseract?

        • Adrian Rosebrock September 11, 2018 at 8:59 am #

          OpenCV itself does not include any OCR functionality, it’s normally handed off to a dedicated OCR library like Tesseract or the Google Vision API.

  30. Prince Bhatia September 10, 2018 at 7:46 am #

    How to print probability that this image has this much 99 percent probability it has text? or image has 0 percent probability that it does not has text?

    • Adrian Rosebrock September 11, 2018 at 8:13 am #

      Are you referring to a specific region of the image having text? Or the image as a whole?

  31. mohamed September 10, 2018 at 9:19 am #

    Hi Adrian
    I apologize for my inaccurate questions

    But I would like to know why the attached form in the downloads is less accurate than the model in the warehouse recommended by the team. in this place:
    Have you modified something to comply with opencv?

    • Adrian Rosebrock September 11, 2018 at 8:11 am #

      The method I’ve used here is a port of the EAST model. As I’ve mentioned in the blog post the code itself cannot computed the rotated bounding boxes.

  32. Gaurav A September 11, 2018 at 5:37 am #

    Hi Adrian,

    Can you please help me out in understanding how i can break the bounding box to alphabets instead of full words ?
    For eg if i have a number 56 0 08
    I am able to do it using findcontours… but its not giving accuracy when the digits are very close. Two digits are being considered as one.
    So the results i get is 56 0 and 08.. But it should be 5 6 0 0 8.
    Can you please suggest some eay to tackle this

    • Adrian Rosebrock September 11, 2018 at 8:01 am #

      If your image is “clean” enough you can perform simple image processing via thresholding/edge detection and contours to extract the digit. For more complex scenes you may need some sort of semantic segmentation. Stay tuned for next week’s blog post where I’ll be discussing how you can actually OCR the text detected by EAST.

  33. Hochan September 11, 2018 at 9:35 pm #

    If you are interested in making your own model and import it to opencv, check this link.

  34. Sushil September 12, 2018 at 7:58 am #

    Hello adrian, Your work is really amazing!! I’m getting some issues with final bounding boxes after nonMaxSupression. I’m getting almost all characters before supression, but in final result some characters are not considered in the bounding boxes because of supression algorith. So, I thought about taking only outer boxes(implementing own algorithm) But ‘rects’ have so many x-y co-ordinates i’m unable to get which co-ordinates are of one box and which are of the other boxes. Do you have any suggestion or solution for this?

    • Adrian Rosebrock September 12, 2018 at 1:51 pm #

      The “rects” list is just your set of bounding box coordinates so I’m not sure what you mean by being unable to get coordinates belong to which box. Each entry in “rects” is a unique bounding box.

  35. Tejas Mahajan September 18, 2018 at 2:35 am #


    The weights file you have used in this blog to show the inference was obtained by training on which dataset?

  36. Rohan September 18, 2018 at 7:09 pm #

    Hey Adrian,

    Do you know if this same EAST algorithm will be able to locate the bounding boxes of handwritten text?


  37. Arindam September 25, 2018 at 11:59 am #

    Hey Adrian,

    The article was really helpful. I was wondering if you could guide me with segregating handwritten text and machine printed text in a picture of a document.

  38. Jonathan Salama September 28, 2018 at 2:39 pm #

    Hello, I was wondering if there is a version that would output the actual text observed. Thanks!

  39. Suresh Doraiswamy September 30, 2018 at 8:57 am #

    When you run Adrian’s using Python 3.6 and OpenCV 3.4.3 on Windows 10,

    If the line, <> shows an error saying cv2.dnn does not have readNet as a valid function, then you can do the following and eliminate the error:

    Open Windows command Line and enter pip install opencv-contrib-python

    I tried this and it works.

    • Adrian Rosebrock October 8, 2018 at 10:55 am #

      It sounds like you are using an older version of OpenCV. Can you confirm which version of OpenCV you are using?

  40. Tu October 1, 2018 at 11:07 pm #

    Hi Adrian,

    It’s great blog post.

    Currently, I’m working on a project that is related with detect object in technical drawing image (eg. CAD scan image). So I need to detect lines, numbers, text in image.

    I tested with your code in this blog. But the accuracy seems not good.

    If you have any idea to improve, please share with me !

    image example here:


  41. Chris October 2, 2018 at 11:53 pm #

    Adrian, how did you freeze the model, ( convert .ckpt to .pb )?

    • Adrian Rosebrock October 8, 2018 at 10:29 am #

      Are you asking how to convert a TensorFlow model to OpenCV format? If you can clarify I can point you in the right direction.

      • Chris October 27, 2018 at 12:15 pm #

        When training EAST, the created model is in .ckpt, how to convert that .ckpt model to .pb so that I am able to use in your opencv version of EAST?

        • Adrian Rosebrock October 29, 2018 at 1:33 pm #

          Refer to the official OpenCV documentation — they include scripts to covert the model to make it compatible with OpenCV directly.

  42. Dekker October 2, 2018 at 11:59 pm #

    Great article. Adrian
    How to implement EAST model.

    • Adrian Rosebrock October 8, 2018 at 10:29 am #

      The model has already been implemented and trained in this post. Do you mean how to train the EAST model from scratch?

  43. vinayak October 4, 2018 at 2:50 am #

    I found CRNN model a great addition on top of east detectors to make full OCR. I had trained it on custom data and it works well.

    original paper:

    • Adrian Rosebrock October 8, 2018 at 10:15 am #

      Thanks for sharing, Vinayak!

  44. Ritika October 10, 2018 at 7:53 am #


    Thanks for sharing the frozen model for east text detector.

    I am currently working on a project where I need to use the tensorflow Lite model for mobile application. To convert the frozen model to tf lite I need to know the names of the input and output tensors. Could you please provide me with the same?


    • Adrian Rosebrock October 12, 2018 at 9:16 am #

      Hey Ritika — I would suggest reaching out to the authors of the EAST paper model (linked to in this blog post). They will be able to provide more suggestions into the model and layer naming conventions.

  45. Bragg Xu October 17, 2018 at 1:28 am #

    Thanks for sharing. I‘m using opencv3.4.1 with python on Mac, is it ok for the version requirement?

    • Adrian Rosebrock October 20, 2018 at 7:57 am #

      Yes, OpenCV 3.4.1 should be sufficient.

  46. Tobi October 18, 2018 at 3:32 am #

    Hi Adrian,

    thanks so much for this post and in general this whole website. I’m really getting in love with computer vision and will try to learn more. As of so I have two particular questions regarding your code or to be more precise about the math behind.
    My questions refer to the first part of your post (text detection in a single image)

    1. You wrote in one of your comments (code line 87):
    “compute the offset factor as our resulting feature maps will be 4x smaller than the input image”
    Where did you get this information and why is it?

    2. Can you explain a bit more detailed how the formula in line 102/103 works (endX, endY)?
    I know that we can use the sinus and cosine functions to find the coordinates but I don’t know how this exactly works. I couldn’t find some good explanations for this in the web. Probably you have a good resource?

    Thanks in advance.

    Best regards,

    • Adrian Rosebrock October 20, 2018 at 7:45 am #

      Take a look at the EAST publication that I linked to in the post. You also might want to look at the architecture visualization and see how the volume size changes as data passes through the network. As for your second question, I think you’re asking where to learn trigonometry? Let me know if I understood your question correctly.

      • Tobi November 2, 2018 at 5:26 am #

        Hi Adrian,

        thanks for your answer, I will check the paper.
        Regarding my second question, yes it’s about learning the trigonometry. I already checked some resources where I learned (refreshed) a bit about cosine and sine but I couldn’t transfer this knowledge to the formula you used. Maybe you have some better resources?

        Best regards,

  47. Saurav October 20, 2018 at 12:24 pm #


    Thank so much for posting this and sharing your knowledge. I love reading your post. This code works very well.

    I was wondering is there any way to detect blocks for a single line at a time.

  48. Xiaodan October 20, 2018 at 5:42 pm #

    Thanks for posting! Great article. One question, could I use EAST text detector to only detect digits?

    • Adrian Rosebrock October 22, 2018 at 8:09 am #

      EAST doesn’t provide you with any context of what the text actually contains, only that text exists somewhere in an image. Therefore, no, you cannot instruct EAST to detect digits. Instead, you would want to perform text recognition and then use Tesseract to return only digits.

      • Xiaodan October 22, 2018 at 8:50 pm #

        Could I replace the training data (presumably English text training data) with digit (math formula training data) and train the same architecture? My purpose is to build an app that can detect then recognize and grade math worksheet problems from photos.

        • Adrian Rosebrock October 29, 2018 at 2:16 pm #

          Presumably yes but you’ll also want to refer to the official EAST GitHub repo that I linked to inside the post.

  49. ali October 24, 2018 at 12:15 am #

    Hi, Adrian
    Thanks for sharing, I have problem when I run the codes on my Pi with webcam suddenly my Pi restarting
    please help me to slove this problem 🙁

    • Adrian Rosebrock October 29, 2018 at 2:04 pm #

      It sounds like your Pi may be becoming overheating and is restarting or there is some sort of physical issue with your Raspberry Pi. Can you try with a different Pi?

  50. Atul Mahajan October 26, 2018 at 8:23 am #

    Thanks Adrian for sharing such grate info.

    I want to read the detected text from live video and for this I thought of first separating the frame in which text is detected and then apply OCR on frame to read the text. But I observed to identify the frame it is very slow and time consuming process.

    Could you please suggest fast solution to read text from live video.

    • Adrian Rosebrock October 29, 2018 at 1:41 pm #

      You would want to push the computation and forward pass of the network to your GPU but unfortunately that’s non-trivial with OpenCV and CUDA right now. I imagine that will be possible in the near future.

  51. Amul Mittal October 29, 2018 at 8:25 am #

    Great Work Brother.
    You are doing awesome job.
    Can you please provide c++ code as well. Because I am unable to understand the code in python, also, I am not getting any tutorial of Scene Text Detection in C++.
    Please help…

  52. Wim van de Brug October 30, 2018 at 6:09 am #

    Hi Adrian,

    Thanks for this great post. I have set up an environment using Python 3.7.1 and OpenCV (from your pip install opencv post). The script runs like a charm but rather slow:
    [INFO] loading EAST text detector…
    [INFO] text detection took 0.569462 seconds

    I run this on a Microsoft Surface Pro 4 Windows 10 in the most minimal virtual env required for this script. Why is it on Windows10 that slow compared to your benchmark?

    Thanks for your earliest reply.


    • Adrian Rosebrock November 2, 2018 at 8:25 am #

      Hey Wim — I’m not sure why the code would be so much slower on a Surface Pro. I’m not personally familiar with the hardware.

Leave a Reply