So, last night, I came across this post:
Http Download a CSV or Json File for Android OS Versions | B4X Programming Forum
So the poster goes to a web page with the Android versions - sees a new one and then edits a csv.txt file by hand and then his cute little program can display a list of andriod versions. All very simple.
So, a quick google for android versions. We get this page:
Android version history - Wikipedia
.... and so on. Not a large table.
So, I thought, why don't I pull that above table out of the web page DOM. I have ALWAYS considered a web page a DOM, and more so what we call XAML, or so called "zammel".
As as general rule - these pages tend to be created with developer tools - not hand coded anymore. As a result, they are NOTHING more then a simple set of start tags, and end tags. (in effect a xml cube - nothing more, nothing less). As a result, I can take say a 10 year old version of the MSXML library - point it to that page, and BOOM! - I have my xml cube all nice and ready.
So, Ok, this should be "easy" - after all we talking about the web - that supposed "fad" ? ;-)
Ok, so I looked around for that DOM things for B4A - most posts pushed towards using jTidy.
So, I have this:
But jTitdyshows this error:
line 3,600 column 1 - Error: <nav> is not recognized!
Huh? Nav? Ok, count to 10 - don't make a rant!!! (the community here is too kind for a rant!!!).
But, gee??? "<nav>"? That's everywhere!!!
This VERY web page I am reading looks to have a bootstrap menu - and has this:
So, my view is that jTity should not care or bother about the kinds of tags it supports, but ONLY care that it has a <start></start> (start + end tag). So this is about the closest you see of a rant by me!
Why should jTidy even care, bother or care about some start/end tag it don't know about?
Now this "idea" only started since I though why not answer that posters question - post a few lines of code that grabs that "table" from that page, and off we go.
Helping that poster aside? I'm having difficulty parsing out that web page. I don't think I should be (but then again - that's my camp and my shortcoming here).
And to be fair - while the technology to do these things is mature right now - it still a coding task, and I should give this problem more respect when I am not!!!).
Now I did give SaxParser a go at chewing on the page. No, surprise - it eat the whole page without a problem.
So, this bit does get me the elements (but posts here suggest to not do this, and besides, it don't really give me much of a object model in all that great of a way to work with).
The above - just picks apart the page - just works - no surprise.
My libraries manager shows jTidy 1.1 - perhaps a newer version is floating around?
I can't say this is high priority and holding me back right now.
But I will say that this ability to grab a DOM from a web page? yes, that task will come up sooner or later in my Android travels.
Suggestions here - what road to take for this?
Regards,
Albert D. Kallal
Edmonton, Alberta Canada
Http Download a CSV or Json File for Android OS Versions | B4X Programming Forum
So the poster goes to a web page with the Android versions - sees a new one and then edits a csv.txt file by hand and then his cute little program can display a list of andriod versions. All very simple.
So, a quick google for android versions. We get this page:
Android version history - Wikipedia
.... and so on. Not a large table.
So, I thought, why don't I pull that above table out of the web page DOM. I have ALWAYS considered a web page a DOM, and more so what we call XAML, or so called "zammel".
As as general rule - these pages tend to be created with developer tools - not hand coded anymore. As a result, they are NOTHING more then a simple set of start tags, and end tags. (in effect a xml cube - nothing more, nothing less). As a result, I can take say a 10 year old version of the MSXML library - point it to that page, and BOOM! - I have my xml cube all nice and ready.
So, Ok, this should be "easy" - after all we talking about the web - that supposed "fad" ? ;-)
Ok, so I looked around for that DOM things for B4A - most posts pushed towards using jTidy.
So, I have this:
B4X:
Sub ScrapePage2
ProgressBar1.Visible = True
Sleep(0) ' show progress bar
Dim j As HttpJob
j.Initialize("j",Me)
j.Download("https://en.wikipedia.org/wiki/Android_version_history")
Wait For (j) JobDone(j As HttpJob)
If j.Success = False Then
ProgressBar1.Visible = False
Return
End If
Dim MyTidy As Tidy
MyTidy.Initialize
Log("convert and write as xml")
MyTidy.Parse(j.GetInputStream,File.DirInternal,"web1.xml")
Log("done - writing xml file")
Dim strXML As String = File.ReadString(File.DirInternal,"web1.xml")
Log("got xml - length = " & strXML.Length)
If strXML.Length = 0 Then
Return
End If
Dim x As Xml2Map
x.Initialize
Dim MyDom As Map
Log("parsing")
MyDom = x.Parse(strXML)
Log("done parse")
For Each skey As String In MyDom.Keys
Log(skey)
Next
But jTitdyshows this error:
line 3,600 column 1 - Error: <nav> is not recognized!
Huh? Nav? Ok, count to 10 - don't make a rant!!! (the community here is too kind for a rant!!!).
But, gee??? "<nav>"? That's everywhere!!!
This VERY web page I am reading looks to have a bootstrap menu - and has this:
B4X:
<nav class="p-nav">
<div class="p-nav-inner">
<a class="p-nav-menuTrigger" data-xf-click="off-canvas" data-menu=".js-headerOffCanvasMenu" role="button" tabindex="0">
<i aria-hidden="true"></i>
<span class="p-nav-menuText">Menu</span>
</a>
.....
So, my view is that jTity should not care or bother about the kinds of tags it supports, but ONLY care that it has a <start></start> (start + end tag). So this is about the closest you see of a rant by me!
Why should jTidy even care, bother or care about some start/end tag it don't know about?
Now this "idea" only started since I though why not answer that posters question - post a few lines of code that grabs that "table" from that page, and off we go.
Helping that poster aside? I'm having difficulty parsing out that web page. I don't think I should be (but then again - that's my camp and my shortcoming here).
And to be fair - while the technology to do these things is mature right now - it still a coding task, and I should give this problem more respect when I am not!!!).
Now I did give SaxParser a go at chewing on the page. No, surprise - it eat the whole page without a problem.
So, this bit does get me the elements (but posts here suggest to not do this, and besides, it don't really give me much of a object model in all that great of a way to work with).
B4X:
' Dim MySax As SaxParser moved to gbl def
MySax.Initialize
MySax.Parse(j.GetInputStream,"MySax")
And then this:
Sub MySax_EndElement(Uri As String, Divname As String, strText As Object)
If Divname = "td" Or Divname = "table" Then
txtTableRaw.Text = txtTableRaw.Text & CRLF & strText
Log(strText)
End If
End Sub
The above - just picks apart the page - just works - no surprise.
My libraries manager shows jTidy 1.1 - perhaps a newer version is floating around?
I can't say this is high priority and holding me back right now.
But I will say that this ability to grab a DOM from a web page? yes, that task will come up sooner or later in my Android travels.
Suggestions here - what road to take for this?
Regards,
Albert D. Kallal
Edmonton, Alberta Canada