B4J Tutorial How to OCR image text

jkhazraji · Sunday at 8:49 PM

1.Download the following .jar files:
lept4j-1.19.0.jar
slf4j-api-1.7.32.jar
jai-imageio-core-1.4.0.jar
jna-5.14.0.jar
tess4j-5.10.0.jar
Place then in addLibs ( or whatever)
Refer to them with #AdditionalJar:
2.Download Tesseract-OCR and place the folder in Objects
3. Use the following inline Java code:

Java:

import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import java.io.IOException;
import net.sourceforge.tess4j.*;
import net.sourceforge.tess4j.ITessAPI;
import net.sourceforge.tess4j.Word;
import java.util.List;

public static void OCRImage(String imgPath) {
        ITesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata"); // Set the path to tessdata folder Objects\tessdata)
        try {
            // Load an image file
            BufferedImage image = ImageIO.read(new File(imgPath));
           
            // Perform OCR on the image
            String text = tesseract.doOCR(image);
           
            // Print the extracted text
            System.out.println("Extracted Text: " + text);

        } catch (TesseractException | IOException e) {
            e.printStackTrace();
        }
    }

4. Call it from b4j:

B4X:

(Me).As(JavaObject).RunMethod("OCRImage",Array(File.Combine(File.DirApp,"b4j.png")))

N.B. This applies to English texts of course.

Magma · 2025-12-03T13:11:55+0000

jkhazraji said:

1.Download the following .jar files:
lept4j-1.19.0.jar
slf4j-api-1.7.32.jar
jai-imageio-core-1.4.0.jar
jna-5.14.0.jar
tess4j-5.10.0.jar
Place then in addLibs ( or whatever)
Refer to them with #AdditionalJar:
2.Download Tesseract-OCR and place the folder in Objects
3. Use the following inline Java code:

Java:

import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import java.io.IOException;
import net.sourceforge.tess4j.*;
import net.sourceforge.tess4j.ITessAPI;
import net.sourceforge.tess4j.Word;
import java.util.List;

public static void OCRImage(String imgPath) {
        ITesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata"); // Set the path to tessdata folder Objects\tessdata)
        try {
            // Load an image file
            BufferedImage image = ImageIO.read(new File(imgPath));
          
            // Perform OCR on the image
            String text = tesseract.doOCR(image);
          
            // Print the extracted text
            System.out.println("Extracted Text: " + text);

        } catch (TesseractException | IOException e) {
            e.printStackTrace();
        }
    }

4. Call it from b4j:

B4X:

(Me).As(JavaObject).RunMethod("OCRImage",Array(File.Combine(File.DirApp,"b4j.png")))

N.B. This applies to English texts of course.

is there anything for greek?

jkhazraji · 2025-12-03T14:34:02+0000

Magma said:
is there anything for greek?

Add the following 2 lines to inline java before try { block:

B4X:

// Set language to Greek (or Greek + English for mixed text)
    tesseract.setLanguage("ell");  // "ell" is the ISO 639-3 code for Greek
    
// Optional: For better results with Greek
    tesseract.setPageSegMode(ITessAPI.TessPageSegMode.PSM_AUTO);

and download ell.traineddata file for Greek character recognition.

jkhazraji · 2025-12-03T15:40:47+0000

Magma said:
is there anything for greek?

B4J Tutorial How to OCR image text

Magma

Expert

jkhazraji

Active Member

jkhazraji

Active Member

Similar Threads

B4J Tutorial How to OCR image text

Magma

Expert

jkhazraji

Active Member

jkhazraji

Active Member

Similar Threads

Privacy & Transparency

Privacy & Transparency