B4J Question JTidy Unknown HTML5 Tags [SOLVED]

Mashiane

Expert
Licensed User
Longtime User
Hi there

Whilst JTidy works well to some extent for my scenario, there is an issue with it parsing html5 attributes as its indicating that things like nav, footer are data-site etc are 'unknown'

I'm using this to parse html content to xml and then run that with xml2map to draw out a structure on a tree and all html5 related elements are not identified.

Is there another method I can use to convert my html5 content to xml besides using jtidy to xmlElements?

I'm still to explore jsoup and it it can do this all the better?

Can someone advise please? Thanks.
 

Mashiane

Expert
Licensed User
Longtime User
There seems to be a way of adding more tags to jtidy.., found something here.. https://stackoverflow.com/questions/8976637/how-to-add-new-tags-to-jtidy

How can I do this inside b4j, i assume using #if java

B4X:
import java.util.Properties;
Properties oProps = new Properties();
oProps.setProperty("new-blocklevel-tags", "header hgroup article footer nav");

Tidy tidy = new Tidy();
tidy.setConfigurationFromProps(oProps);
 
Upvote 0

Mashiane

Expert
Licensed User
Longtime User
Solution:

Copy attached file to your dirassets and...

https://stackoverflow.com/questions/8976637/how-to-add-new-tags-to-jtidy

B4X:
tid.Initialize
    'ensure it shows the output
    Dim jo As JavaObject = tid
    jo.GetFieldJO("tidy").RunMethod("setForceOutput", Array(True))
    File.Copy(File.DirAssets,"jtidytags.txt",File.DirTemp,"jtidytags.txt")
    Dim fn As String = File.combine(File.DirTemp,"jtidytags.txt")
    jo.GetFieldJO("tidy").RunMethod("setConfigurationFromFile",Array(fn))
   
    'parse the Html page and create a new xml document.
    tid.Parse(File.OpenInput(File.DirTemp, "temp.html"), File.DirApp, "temp.xml")
 

Attachments

  • jtidytags.txt
    330 bytes · Views: 353
Upvote 0
Top