B4J Tutorial How to OCR image text

1.Download the following .jar files:
lept4j-1.19.0.jar
slf4j-api-1.7.32.jar
jai-imageio-core-1.4.0.jar
jna-5.14.0.jar

tess4j-5.10.0.jar
Place then in addLibs ( or whatever)
Refer to them with #AdditionalJar:

2.Download Tesseract-OCR and place the folder in Objects
3. Use the following inline Java code:
Java:
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import java.io.IOException;
import net.sourceforge.tess4j.*;
import net.sourceforge.tess4j.ITessAPI;
import net.sourceforge.tess4j.Word;
import java.util.List;

public static void OCRImage(String imgPath) {
        ITesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata"); // Set the path to tessdata folder Objects\tessdata)
        try {
            // Load an image file
            BufferedImage image = ImageIO.read(new File(imgPath));
           
            // Perform OCR on the image
            String text = tesseract.doOCR(image);
           
            // Print the extracted text
            System.out.println("Extracted Text: " + text);

        } catch (TesseractException | IOException e) {
            e.printStackTrace();
        }
    }
4. Call it from b4j:
B4X:
(Me).As(JavaObject).RunMethod("OCRImage",Array(File.Combine(File.DirApp,"b4j.png")))

N.B. This applies to English texts of course.
 

Magma

Expert
Licensed User
Longtime User
1.Download the following .jar files:
lept4j-1.19.0.jar
slf4j-api-1.7.32.jar
jai-imageio-core-1.4.0.jar
jna-5.14.0.jar

tess4j-5.10.0.jar
Place then in addLibs ( or whatever)
Refer to them with #AdditionalJar:

2.Download Tesseract-OCR and place the folder in Objects
3. Use the following inline Java code:
Java:
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import java.io.IOException;
import net.sourceforge.tess4j.*;
import net.sourceforge.tess4j.ITessAPI;
import net.sourceforge.tess4j.Word;
import java.util.List;

public static void OCRImage(String imgPath) {
        ITesseract tesseract = new Tesseract();
        tesseract.setDatapath("tessdata"); // Set the path to tessdata folder Objects\tessdata)
        try {
            // Load an image file
            BufferedImage image = ImageIO.read(new File(imgPath));
          
            // Perform OCR on the image
            String text = tesseract.doOCR(image);
          
            // Print the extracted text
            System.out.println("Extracted Text: " + text);

        } catch (TesseractException | IOException e) {
            e.printStackTrace();
        }
    }
4. Call it from b4j:
B4X:
(Me).As(JavaObject).RunMethod("OCRImage",Array(File.Combine(File.DirApp,"b4j.png")))

N.B. This applies to English texts of course.
is there anything for greek?
 

jkhazraji

Active Member
Licensed User
Longtime User
is there anything for greek?
Add the following 2 lines to inline java before try { block:
B4X:
// Set language to Greek (or Greek + English for mixed text)
    tesseract.setLanguage("ell");  // "ell" is the ISO 639-3 code for Greek
    
// Optional: For better results with Greek
    tesseract.setPageSegMode(ITessAPI.TessPageSegMode.PSM_AUTO);
and download ell.traineddata file for Greek character recognition.
 

jkhazraji

Active Member
Licensed User
Longtime User
is there anything for greek?
1764776442592.png
 
Top