Android Example OCR OFFLINE - Tesseract

I'm working in a project that needs OCR offline. Made a small progress and decided to share here and get some feedback.
I've searched a little bit at this forum and google about it. Found options to use online OCR (NJDUDE's Lib or Erel's example). In the same project I also needs to manipulate some images and got DrewG Exemple about inline code to use JAVACV/OPENCV. This was the point to test Tesseract OCR in the same way (Inline code).
Downloaded the Lib at this link https://repo1.maven.org/maven2/org/bytedeco/javacpp-presets/1.0/javacpp-presets-1.0-bin.zip
More details about this can be found here.

Unzipped and copied the files I needed to my Additional Lib Folder.
The files I used: javacpp.jar, tesseract-android-arm.jar, leptonica-android-arm.jar, tesseract.jar, leptonica.jar

Coded Basic Example from bytedeco page. Made some changes to send the image as a file path to the image saved somewhere in the phone and got the "translation" text.

OBS: I needed to download tessdata files to my cell. Tried to add them to my app, but they were too big and I got some error deploying my app to cell (need to see this more carefully). The files to many languages can be found here or at google project page. I have download this one for my test example.

Here is the code I used. My test phone is a S4.

Hope it helps.
 

Attachments

  • tessTest.zip
    11.9 KB · Views: 1,809
Last edited:

joilts

Member
Licensed User
Longtime User
I thank you for your disponibiltà and the time you're losing me, I tried your routines is fine to do the image B & W, realizing edesso comene works that I can put inside other commands in rurine fact I added "IplConvKernel mat = cvCreateStructuringElementEx (5, 5, 2, 2, CV_SHAPE_RECT);
cvErode (grayImg, grayImg, mat, 1); so I more black letters and numbers, but I noticed that tessdata OCR does not recognize well the numbers and sometimes letters.
I thank you you're great.
Now you have to train your Tesseract and create your on font file to use at your app.
 

roberto64

Active Member
Licensed User
Longtime User
Now you have to train your Tesseract and create your on font file to use at your app.
hi, I used VietOCR.NET to understand what kind of fomato Tesseract OCR could recognize both letters and numbers, I made a crop the image and tried to make it recognize OCR, sarebe this what you say? or you must also make a Piccalo rutin like this (var = srcImage1 System.Drawing.Image.FromFile (@ "D: \ Image \ font_english.jpg");
var newWidth1 = (int) (srcImage1.Width * 2);
var newHeight1 = (int) (srcImage1.Height * 2);

var image = new Bitmap (srcImage1, new Size (newWidth1, newHeight1));
Tesseract OCR var = new ();

ocr.Init (@ "D: \ OCRTEST \ tessdata \", "eng", false);
ocr.SetVariable ("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-0123456789 '?.; =, ()");
var result = ocr.DoOCR (image, Rectangle.Empty);
foreach (Word word in result)
{
Response.Write (word.Text + "");

})
Greetings
 

DonManfred

Expert
Licensed User
Longtime User
Last edited:

roberto64

Active Member
Licensed User
Longtime User
Hi, I do not understand why passing a Bitmap image within the function as described below, by mistake, I would like to go directly to the Bitmap image and not save the image first and then view it.
regards
B4X:
Dim path As Bitmap
        Dim retStr As String = ""
        retStr = NativeMe.RunMethod("getText",  Array (path,File.DirRootExternal))

#AdditionalJar: javacpp-presets-bin\javacpp
#AdditionalJar: javacpp-presets-bin\tesseract-android-arm
#AdditionalJar: javacpp-presets-bin\leptonica-android-arm
#AdditionalJar: javacpp-presets-bin\tesseract
#AdditionalJar: javacpp-presets-bin\leptonica

'public static String getText(String path, String filename, String extension, String TrainFileDir)
#If JAVA
import org.bytedeco.javacpp.*;
import static org.bytedeco.javacpp.lept.*;
import static org.bytedeco.javacpp.tesseract.*;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.graphics.Matrix;


private TessBaseAPI tessBaseAPI;
//public static String getText(String path, String filename, String extension, String TrainFileDir)
public static String getText(Bitmap path, String TrainFileDir) {
    BA.Log("" + "Here - getText() ");
    BytePointer outText;
    TessBaseAPI api = new TessBaseAPI();
    BA.Log("" + "Before Init ");
    int retCode = api.Init(TrainFileDir, "eng");
    BA.Log("directori =" + TrainFileDir);
    BA.Log("RETCODE =" + retCode);
    if (retCode != 0) {
        return("Could not initialize tesseract.");
    }
  
    PIX image = pixRead(Bitmap);
    BA.Log("" + "File Open");
    api.SetImage(image);
    BA.Log("" + "Before get Text");
    outText = api.GetUTF8Text();
    api.End();
    outText.deallocate();
    pixDestroy(image);
    return(outText.getString());

}
#End If
 
Top