B4A Library JTidy library - Convert HTML pages to XML

This library wraps JTidy open source project.
It is supported by B4A and B4J.

It allows you to convert a HTML page to XHTML page. XHTML can be parsed with a XML parser.

This approach is better than trying to parse HTML with regular expressions.

Usage is simple:
B4X:
Sub Process_Globals
   Dim sax As SaxParser
   Dim tid As Tidy
End Sub

Sub Activity_Create(FirstTime As Boolean)
   tid.Initialize
   'parse the Html page and create a new xml document.
   tid.Parse(File.OpenInput(File.DirAssets, "index.html"), File.DirRootExternal, "1.xml")
   sax.Initialize
   sax.Parse(File.OpenInput(File.DirRootExternal, "1.xml"), "sax")
End Sub

Tips: By default jTidy will not output anything if it encounter an error. You can see the errors in the unfiltered logs.

You can force it to always make output with:
B4X:
tid.Initialize
Dim jo As JavaObject = tid
jo.GetFieldJO("tidy").RunMethod("setForceOutput", Array(True))

If parsing of the generated document is very slow then follow this post: https://www.b4x.com/android/forum/t...seems-very-slow-long-delay.91627/#post-578641
 

Attachments

  • JTidy.zip
    245 KB · Views: 1,189
Last edited:

Jehoschua

Member
Licensed User
Thanks a lot for this Library!,

it would be great if we could use the Tidy.Parse() result without writing files,
so that we can just send the result stream to Sax.Parse(), which accept an input stream :)

Thanks a lot,
kind regards,
Thomas
 
Top