This is a difficult task.
Understatement of the week ;-/
I have a friend who was doing OCR reading of name and number off a credit-card-like card for registering people arriving for appointments at a business, but he used Windows and some library (Tesseract?). The accuracy was 80-90% but 1/ we had a good idea of the names (and sometimes even the numbers) likely to be scanned on a given day, and 2/ the number had a check digit, so when he took the OCR output and then applied those filters to it to clean up stuff like
is that a D or an O? then it got to like 98%. The other 2% get handled manually, but they are almost all people who aren't on the list, and thus manual intervention is required anyway, so... no worries.
I tried your sample on some online OCR sites. It wasn't pretty. Best result was:
ScotchRow 0403065692( /83
..ti, ...A I ■ Uwt diaby....t414440.n osier. v." taw, 44.'114 .6.44,vor.4 ca Aordotri
14 4.1141L% ro4,11., mats Yes u• mple • IVINerorianrws, to.. WU. IX QV.
rearm a nailialualwrotionegyou..1.04.0 rainaigewirinewbiliamonovit.ty 430 tad
ateckyaobalanat.Dat924. 4ret,50C, • moterdoinwiDonralt MTN Con.c-t(rwr cheAr, 00
tram yourNIT/4 nutmeg 70201008 wits.
Nurnhe 2924951999 111111111111111111111
from
https://www.onlineocr.net, which'd be great if it was the bottom number you were after. It was all downhill from there with three other online OCR sites.
But, on the positive side, if the card colors and layout are consistent, then you should be able to:
- use the yellow background to detect the four edges of the ticket
- straighten up and measure the ticket
- calculate the position and size of the number you need to OCR
- it looks like a fixed-pitch font, so you can probably even pick out the digits by position and the white space surrounding them
- you know they're digits, so if your OCR routine comes back with eg letters O, I, Z, S, G or B you can be pretty sure they should be 0, 1, 2, 5, 6, or 8.
And if you're writing your own OCR routine (to discern between 10 digits -
how hard could that be??? ;-) and you have heaps of spare time (
don't we all??? ;-) then it'd be interesting to incorporate some machine-learning of the digits out in the wild, so that the OCR gets better the more it gets used.
What could possibly go wrong?!?!