Android Question Need Help - Parsing HTML Page

joschi

Member
Licensed User
Longtime User
Hello!

I need your help.

I would like to Parse a Page from the Internet. On the page there are notifications/alarms.

I would like to parse a spezial String. Is this String on the Site I would like to extract the line an Display the alarm in a MessageBox.

I found this at the Forum

http://www.b4x.com/android/forum/threads/save-webview-html-file.9400/#post-56406

I have adapted this

B4X:
Sub Globals
    'These global variables will be redeclared each time the activity is created.
    'These variables can only be accessed from this module.
   
    Dim WebViewExtras1 As WebViewExtras
    Dim WebView1 As WebView
   
End Sub

Sub Activity_Create(FirstTime As Boolean)
    Activity.LoadLayout("layoutMain")
   
    '    add the B4A javascript interface to the WebView
    WebViewExtras1.addJavascriptInterface(WebView1, "B4A")
   
    '    now load a web page
    WebView1.LoadUrl("http://www.lfv-tirol.at/service/aktuelle-alarmierungen.html")
End Sub

Sub Activity_Resume

End Sub

Sub Activity_Pause (UserClosed As Boolean)

End Sub

Sub WebView1_PageFinished (Url As String)
    '    Now that the web page has loaded we can get the page content as a String
   
    '    see the documentation http://www.b4x.com/forum/additional-libraries-classes-official-updates/12453-webviewextras.html#post70053 for details of the second parameter callUIThread
   
    Dim Javascript As String
    Javascript="B4A.CallSub('ProcessHTML', false, document.documentElement.outerHTML)"
   

    'Log("PageFinished: "&Javascript)
    WebViewExtras1.executeJavascript(WebView1, Javascript)
End Sub

Sub ProcessHTML(Html As String)
   
    Dim objMatch As Matcher
    Dim arrLinks As String
    arrLinks = ""
    ' Match expression to HTML
 
    objMatch = Regex.Matcher("Lienz", Html)
    ' Loop through matches and add <1> to List
    Do While objMatch.find()
     
        Log(objMatch.Group(1)) 
       
      Loop
   
   
    'Msgbox2(strMatch,"Letzter Alarm","Ok","Cancel","No",Null)
   
   
End Sub

The Page will loaded in the webview. With the matcher I look if the string is there. But I don't can parse the line within the matcher.

I hope somebody can help me.

best regards

Aljoscha
 

sorex

Expert
Licensed User
Longtime User
what's the reason that you use a webview and javascript injections?

It would be easier to just download the content and parse the response string.
 
Upvote 0

joschi

Member
Licensed User
Longtime User
hi, there is no reason; the thread I have posted was the first I have found.

Can I download the html file and parse it offline?
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
well, you need to be online to download it first ;)

when it's pulled in you can do whatever you want with it.
 
Upvote 0

joschi

Member
Licensed User
Longtime User
if I understand right

1. download the HTML file
2. Convert the HTML to XHTML
3. Parse the XHTML File with a XML parser

Aljoscha
 
Upvote 0

warwound

Expert
Licensed User
Longtime User
if I understand right

1. download the HTML file
2. Convert the HTML to XHTML
3. Parse the XHTML File with a XML parser

Aljoscha

The HTML document may be HTML or it may be XHTML.

2. Convert the HTML/XHTML to XML.

3. Parse the XML with an XML parser.

Otherwise yes those are the steps we're suggesting.

Martin.
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
it's a regular html file so I don't see why you need to convert to xml. it's malformed html aswell so I don't know if the xml parser would work right.

all he needs is a regex like "<li class=""rssfeed_item"">(.*?)Lienz</a>" to grab his Lienz alarms
 
Upvote 0
Top