B4A Library jSoup HTML Parser

Martin Larsen

Active Member
Licensed User
Longtime User
First, thanks TheJinJ for making this library available! It works fine so far, although the syntax is very much different from the original version.

It would be great if it can be done to use this lib in this way:

I second that! I have used jsoup for several java based Android projects and I really like the chainable jQuery like syntax.

I would very much like to have something like that implemented for this wrapper library.

Erel, if you read this: Is it possible to use method chaining in B4A?
 

Martin Larsen

Active Member
Licensed User
Longtime User
Do you mean that it is possible to make a library wrapper in B4A that uses chaining if the Java library is written correctly?
 

BowTieNeck

Member
Licensed User
Longtime User
The latest version of JSoup is 1.8.3. I'm getting an error because the current code is expecting version 1.8.1. I couldn't see anywhere that I could download the older version of JSoup. Would it be possible for you to get the code to just pick up whatever version is in the libraries folder?
Thanks,
Chris

Edit:
I've changed your xml file so it now depends on jsoup-1.8.3 and that works ok. However it's not really a long term solution.
 
Last edited:

mr23

Active Member
Licensed User
Longtime User
Update: a reboot of the PC and now it works, go figure.

I pulled down 1.8.1 from the first post, and the b4a example, placed the jSoup.jar,.xml and jsoup-1.8.1.jar into an additional library folder.
Using b4a v4.3, just trying to compile the project fails on line 56 with missing parameter(s).
56 Log(js.connectXtra(url, "Mozilla", 0))​
'intellisense' shows a number of additional required parameters.

Commenting that line out, and it gets to
66 DOM1 = js.getElementsByTag(local_html, "a", "")​
with 'intellisense' showing only 2 parameters in getElementsByTag.
Have I made a mistake, or is the B4A sample out of date with the supplied library files?

I was looking to try this as JTidy doesn't have any tolerance for unrecognized tags or malformation or (haven't dug in yet) html. JTidy doesn't work with 'http://google.com' nor with 'https://www.b4x.com/android/forum/forums/share-your-creations.33/page-1?order=view_count' for examples.

update: found this enhancement that may help but need to wrap it to test. https://github.com/nanndoj/jtidy

-Chris
 
Last edited:

Martin Larsen

Active Member
Licensed User
Longtime User
How do you work with a js doc read from a file like in your example:

B4X:
js.parse_InputStream(File.OpenInput(File.DirAssets, "test.html"), "UTF-8", url)

How do you eg. select an element:

B4X:
js.getElementByID(local_html, "name"))

These methods work on a local html string as in the snippet about. What if you needed to select the element from the file just read?

PS. I know you can of course read the local html with File.ReadString but since the parse_inputStream method (and likewise the connect() method) exists, there surely must be a way to work with them.
 
Last edited:

Rusty

Well-Known Member
Licensed User
Longtime User
I could not get your sample code to compile.
It looks like there are many parameters missing using the latest jsoup.jar.
Is there any updated sample anywhere?
Thanks
Rusty
 

Similar Threads

Cookies are required to use this site. You must accept them to continue using the site. Learn more…