B4J Tutorial How to OCR image text

1.Download the following .jar files:
lept4j-1.19.0.jar
slf4j-api-1.7.32.jar
jai-imageio-core-1.4.0.jar
jna-5.14.0.jar

tess4j-5.10.0.jar
Place then in addLibs ( or whatever)
Refer to them with #AdditionalJar:

2.Download Tesseract-OCR and place the folder in Objects
3. Use the following inline Java code:
Java:
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import java.io.IOException;
import net.sourceforge.tess4j.*;
import net.sourceforge.tess4j.ITessAPI;
import net.sourceforge.tess4j.Word;
import java.util.List;

public static void OCRImage(String imgPath) {
        ITesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata"); // Set the path to tessdata folder Objects\tessdata)
        try {
            // Load an image file
            BufferedImage image = ImageIO.read(new File(imgPath));
           
            // Perform OCR on the image
            String text = tesseract.doOCR(image);
           
            // Print the extracted text
            System.out.println("Extracted Text: " + text);

        } catch (TesseractException | IOException e) {
            e.printStackTrace();
        }
    }
4. Call it from b4j:
B4X:
(Me).As(JavaObject).RunMethod("OCRImage",Array(File.Combine(File.DirApp,"b4j.png")))

N.B. This applies to English texts of course.
 

Magma

Expert
Licensed User
Longtime User
is there anything for greek?
 

jkhazraji

Active Member
Licensed User
Longtime User
is there anything for greek?
Add the following 2 lines to inline java before try { block:
B4X:
// Set language to Greek (or Greek + English for mixed text)
    tesseract.setLanguage("ell");  // "ell" is the ISO 639-3 code for Greek
    
// Optional: For better results with Greek
    tesseract.setPageSegMode(ITessAPI.TessPageSegMode.PSM_AUTO);
and download ell.traineddata file for Greek character recognition.
 
Cookies are required to use this site. You must accept them to continue using the site. Learn more…