I'm thinking about this. I have small tickets always with the same font and size. I am not able to extract digits. With OCR libs the results vary greatly. From good to poor. I had incredible result using Google Vision Cloud but this is paid.
Can you recognize symbols (numbers) using your openCV library?
Whatever solution you use, you should take advantage of the 'known info' of this specific case.
- If the tickets are always of a given color, you can pre-process the image to better contrast with the numbers
- You can even get a boundary box (the ticket contour) where the algorithm will search.
- If the number of digits is fixed, you could even get a box for the position of each one
- If the font is known, it should be easier.
With OpenCV, using aboove info and selecting picture regions you could use "feature detector -> descriptor extractor -> matcher" and build your OCR based on it (by comparing matching results against the 10 possible digits and stablish some kind of score). As you can suppose, it will require some work.
On the other hand, this same info "could" allow you to improve the results that you are getting now with Tesseract, if it's using the command API (don't know if there are options in Google Vision) --> you could prepare the image based on that, tell Tesseract that you are only looking for digits, no dictionary, and even prepare a font file with those specific digits. I have never performed that last step but know from other users that it is possible.