Android Question OpenCV and OCR.

Eme Fibonacci · Oct 30, 2017

Hi, Has anyone tested the extraction of texts (OCR) with OpenCV lib? Are the results good ?

JordiCP · Oct 30, 2017

In my opinion, OpenCV by itself isn't enough, since text extraction needs both spatial and syntax awareness (you would need to add a lot of high-level logic). The role for OpenCV here would be more as a 'companion' utility for Google Vision or Tesseract (there are some posts/libs by DonManfred and Johan) , in order to clean, tune or do whatever with the image before processing it. Although both (Tesseract and Google Vision) already incorporate many low-level routines performing some of these steps automaticaly.

A different case would be a tuned algorithm for single char (or sequences of) recognition. Here the low level processing power of OpenCV would make sense.

Eme Fibonacci · Oct 30, 2017

JordiCP said:
A different case would be a tuned algorithm for single char (or sequences of) recognition. Here the low level processing power of OpenCV would make sense.

I'm thinking about this. I have small tickets always with the same font and size. I am not able to extract digits. With OCR libs the results vary greatly. From good to poor. I had incredible result using Google Vision Cloud but this is paid.

Can you recognize symbols (numbers) using your openCV library?

Thank you.

tigrot · Oct 30, 2017

Don't relay on OCR and number, if don't have a way to check the result. With sentences you have the way to check a vocabulary, with numbers you have nothing. A check digit is necessary in my opinion

tigrot · Oct 30, 2017

15 years ago we introduced a bar code to check a number on Amex applications, Amex is still using the same algorithm to produce it

JordiCP · Oct 30, 2017

Eme Fibonacci said:
I'm thinking about this. I have small tickets always with the same font and size. I am not able to extract digits. With OCR libs the results vary greatly. From good to poor. I had incredible result using Google Vision Cloud but this is paid.

Can you recognize symbols (numbers) using your openCV library?

Whatever solution you use, you should take advantage of the 'known info' of this specific case.

If the tickets are always of a given color, you can pre-process the image to better contrast with the numbers
You can even get a boundary box (the ticket contour) where the algorithm will search.
If the number of digits is fixed, you could even get a box for the position of each one
If the font is known, it should be easier.

With OpenCV, using aboove info and selecting picture regions you could use "feature detector -> descriptor extractor -> matcher" and build your OCR based on it (by comparing matching results against the 10 possible digits and stablish some kind of score). As you can suppose, it will require some work.

On the other hand, this same info "could" allow you to improve the results that you are getting now with Tesseract, if it's using the command API (don't know if there are options in Google Vision) --> you could prepare the image based on that, tell Tesseract that you are only looking for digits, no dictionary, and even prepare a font file with those specific digits. I have never performed that last step but know from other users that it is possible.

Android Question OpenCV and OCR.

Eme Fibonacci

Well-Known Member

JordiCP

Expert

Eme Fibonacci

Well-Known Member

tigrot

Well-Known Member

tigrot

Well-Known Member

JordiCP

Expert

Similar Threads