B4J Question How to extract images from PDF.

T201016

Active Member
Licensed User
Longtime User
Hi,
I don't know if I modified the sample project well,
I receive the following error in the compilation:


A tip is welcome, which is wrong in the code

Example::
'Non-UI application (console / server application)
#Region Project Attributes
    #CommandLineArgs:
    #MergeLibraries: True
#End Region

    #AdditionalJar: pdfbox-app-2.0.26
'    download https://www.apache.org/dyn/closer.lua/pdfbox/2.0.26/pdfbox-app-2.0.26.jar

Sub Process_Globals
    Private jo As JavaObject   
End Sub

Sub AppStart (Args() As String)
    
    jo = Me
    
'    Dim pathPDF As String = "D:\\TMP\\UE.pdf"
    
'    jo.RunMethod("SaveImagesInPdf", Array As Object(pathPDF))
    jo.RunMethod("SaveImagesInPdf", Null)

End Sub

#if java
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.PDFStreamEngine;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.List;

import javax.imageio.ImageIO;

/**
 * This is an example on how to extract images from pdf.
 */
public void class SaveImagesInPdf extends PDFStreamEngine
{
    /**
     * Default constructor.
     *
     * @throws IOException If there is an error loading text stripper properties.
     */
    public SaveImagesInPdf() throws IOException
    {
    }

    public int imageNumber = 1;

    /**
     * @param args The command line arguments.
     *
     * @throws IOException If there is an error parsing the document.
     */
    public void main( String[] args ) throws IOException
    {
        PDDocument document = null;
        String fileName = "D:\\TMP\\UE.pdf";
        try
        {
            document = PDDocument.load( new File(fileName) );
            SaveImagesInPdf printer = new SaveImagesInPdf();
            int pageNum = 0;
            for( PDPage page : document.getPages() )
            {
                pageNum++;
                System.out.println( "Processing page: " + pageNum );
                printer.processPage(page);
            }
        }
        finally
        {
            if( document != null )
            {
                document.close();
            }
        }
    }

    /**
     * @param operator The operation to perform.
     * @param operands The list of arguments.
     *
     * @throws IOException If there is an error processing the operation.
     */
    @Override
    protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
    {
        String operation = operator.getName();
        if( "Do".equals(operation) )
        {
            COSName objectName = (COSName) operands.get( 0 );
            PDXObject xobject = getResources().getXObject( objectName );
            if( xobject instanceof PDImageXObject)
            {
                PDImageXObject image = (PDImageXObject)xobject;

                // same image to local
                BufferedImage bImage = image.getImage();
                ImageIO.write(bImage,"PNG",new File("image_"+imageNumber+".png"));
                System.out.println("Image saved.");
                imageNumber++;

            }
            else if(xobject instanceof PDFormXObject)
            {
                PDFormXObject form = (PDFormXObject)xobject;
                showForm(form);
            }
        }
        else
        {
            super.processOperator(operator, operands);
        }
    }

}
#End If
 

Attachments

  • Save Imagess.zip
    1.9 KB · Views: 83
  • UE.pdf
    189.7 KB · Views: 102
Last edited:
Solution
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

B4X:
Sub AppStart (Args() As String)
 
    Dim Source As String = "D:\UE.pdf"
    Dim Destination As String = "D:\"

    Dim F As JavaObject
    F.InitializeNewInstance("java.io.File",Array(Source))

    Dim Document As JavaObject
    Document.InitializeStatic("org.apache.pdfbox.pdmodel.PDDocument")
 
    Dim Doc As JavaObject = Document.RunMethod("load",Array(F))
    Dim PageTree As JavaObject = Doc.RunMethodJO("getDocumentCatalog",Null).RunMethod("getPages",Null)
 
    Dim TotalImages As Int = 1
 
    Dim Iterator As JavaObject = PageTree.RunMethod("iterator",Null)
 
    Do While Iterator.RunMethod("hasNext",Null)
        Dim Page As JavaObject...

stevel05

Expert
Licensed User
Longtime User
I haven't looked at your code yet, but the first thing that strikes me is that the identifier it is complaining about has a capital P. Should probably be public. I can't see that in the code you've listed but worth a look to see if it's there.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I've got it compiling and loading a static class, but there is a lot that needs sorting out to get it to work. It's built as an app so you'd need to replace the main procedure for a start. Where did you find the example?

I've only changed the first line so far to:
B4X:
public static class SaveImagesInPdf extends PDFStreamEngine
{

And initialized it as a static class:

B4X:
    jo.InitializeStatic("b4j.example.main.SaveImagesInPdf")

but, as I say, there will be quite a bit to change to have a hope of getting it to work.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

B4X:
Sub AppStart (Args() As String)
 
    Dim Source As String = "D:\UE.pdf"
    Dim Destination As String = "D:\"

    Dim F As JavaObject
    F.InitializeNewInstance("java.io.File",Array(Source))

    Dim Document As JavaObject
    Document.InitializeStatic("org.apache.pdfbox.pdmodel.PDDocument")
 
    Dim Doc As JavaObject = Document.RunMethod("load",Array(F))
    Dim PageTree As JavaObject = Doc.RunMethodJO("getDocumentCatalog",Null).RunMethod("getPages",Null)
 
    Dim TotalImages As Int = 1
 
    Dim Iterator As JavaObject = PageTree.RunMethod("iterator",Null)
 
    Do While Iterator.RunMethod("hasNext",Null)
        Dim Page As JavaObject = Iterator.RunMethod("next",Null)
        Dim Resources As JavaObject = Page.RunMethod("getResources",Null)
        Dim Iterable As JavaObject = Resources.RunMethod("getXObjectNames",Null)
        Dim ResIterator As JavaObject = Iterable.RunMethod("iterator",Null)

        Do While ResIterator.RunMethod("hasNext",Null)
            Dim Name As Object = ResIterator.RunMethod("next",Null)

            If Resources.RunMethod("isImageXObject",Array(Name)) Then
                Dim OutputPath As String = File.Combine(Destination,$"Image${TotalImages}.png"$)

                Dim PDXObject As JavaObject = Resources.RunMethod("getXObject",Array(Name))
                Dim BufferedImage As JavaObject = PDXObject.RunMethod("getImage",Null)
                Dim FOS As JavaObject
                FOS.InitializeNewInstance("java.io.FileOutputStream",Array(OutputPath))

                Dim ImageIO As JavaObject
                ImageIO.InitializeStatic("javax.imageio.ImageIO")
                ImageIO.RunMethod("write",Array(BufferedImage,"png",FOS))
            
                TotalImages = TotalImages + 1
            End If
        Loop
    
    Loop

    Doc.RunMethod("close",Null)

End Sub

I used pdfbox-app-2.0.27 as that's what I had downloaded.




 
Last edited:
Upvote 1
Solution

T201016

Active Member
Licensed User
Longtime User
Sometimes it's really not worth getting down to Java, here is a solution using JavaObject:

Hello and thank you very much for taking the time @stevel05
Regarding the place, I found this code on the page: PDFBox Tutorial

Soon I will try to implement your corrected code. There is a lot of it in this reason to reach for javaobject ..., I admit that java coding sometimes does not lie to me
I see that I used the PDFBOX-APP-2.0.27 version, I will try in my project at 3.0.4-apparently so far without gaps in the code.
I wish you a pleasant day.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I used 2.0.27 because you had 2.0.26 in your app. Some things have changed in version 3 and this code will not work as is.
 
Upvote 0

T201016

Active Member
Licensed User
Longtime User
I used 2.0.27 because you had 2.0.26 in your app. Some things have changed in version 3 and this code will not work as is.
I just wanted to mention that a lot of things are changed in these versions. It is not known sometimes which to use.
I will gladly use the proposed V3 version.
 
Upvote 0

stevel05

Expert
Licensed User
Longtime User
I just wanted to mention that a lot of things have been changed in these versions. It's sometimes hard to know which one to use.
Yes, V3 is quite a bit different, but generally use the one that works, unless it was upgraded because of security issues.
 
Upvote 0

T201016

Active Member
Licensed User
Longtime User
Yes, V3 is quite a bit different, but generally use the one that works, unless it was upgraded because of security issues.
Mainly that's why I changed the version to 3 because it seems to be updated in terms of security problems. Somewhere I read an article on this subject, if I find the text, I will also post a link for curiosity.
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…