Android Tutorial XML Parsing with the XmlSax library

It is simpler to parse XML with Xml2Map class: https://www.b4x.com/android/forum/threads/b4x-xml2map-simple-way-to-parse-xml-documents.74848/

The XmlSax library provides an XML Sax parser.
This parser sequentially reads the stream and raises events at the beginning and end of each element.
The developer is responsible to do something useful with those events.

There are two events:
B4X:
StartElement (Uri As String, Name As String, Attributes As Attributes)
EndElement (Uri As String, Name As String, Text As StringBuilder)
The StartElement is raised when an element begins. This event includes the element's attributes list.
EndElement is raised when an element ends. This event includes the element's text.

In this example we will parse the forum RSS feed. RSS is formatted using XML.
A simplified example of this RSS is:
B4X:
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
    <channel>
        <title>Basic4ppc  / Basic4android - Android programming</title>
        <link>http://www.b4x.com/forum</link>
        <description>Basic4android - android programming and development</description>
        <ttl>60</ttl>
        <image>
            <url>http://www.b4x.com/forum/images/misc/rss.jpg</url>
            <title>Basic4ppc  / Basic4android - Android programming</title>
            <link>http://www.b4x.com/forum</link>
        </image>
        <item>
            <title>Phone library was updated - V1.10</title>
            <link>http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</link>
            <pubDate>Sun, 12 Dec 2010 09:27:38 GMT</pubDate>
            <guid isPermaLink="true">http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
        </item>
        ...MORE ITEMS HERE
    </channel>
</rss>
The first line is part of the XML protocol and is ignored.
On the second line the StartElement event will be raised with "Name = rss" and the attributes will include the "version" field.
The EndElement of the rss element will only be called on the last line: </rss>.

We will populate a list view with all items parsed from an offline file. When the user will press on an item we will open the browser with the relevant link.
Every item represents a forum thread.

xmlsax_1.png


For each item we are interested in two values. The title and the link.
The SaxParser object includes a handy list that holds the names of all the current parents elements.
This is useful as it will help us find the "correct" 'title' and 'link' elements. The correct elements are the ones under the 'item' element.

The parsing code in this case is pretty simple:
B4X:
Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)

End Sub
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    If parser.Parents.IndexOf("item") > -1 Then
        If Name = "title" Then
            Title = Text.ToString
        Else If Name = "link" Then
            Link = Text.ToString
        End If
    End If
    If Name = "item" Then
        ListView1.AddSingleLine2(Title, Link) 'add the title as the text and the link as the value
    End If
End Sub
Title and Link are global variables.
We are only using EndElement events in this program.
First we check if we are inside an 'item' element. If this is the case we check the actual element name and save it if it is 'title' or 'link'.

If the current element is 'item' it means that we are done parsing an item.
So we add the data collected to the list view.

We are using ListView.AddSingleLine2. This method receives two values. The first is the item text and the second is the value that will return when the user will click on this item. In this case we are storing the link as the return value.

Later we will use it to open the browser:
B4X:
Sub ListView1_ItemClick (Position As Int, Value As Object)
    StartActivity(PhoneIntents1.OpenBrowser(Value)) 'open the brower with the link
End Sub
The code that initiated the parsing is:
B4X:
    Dim in As InputStream
    in = File.OpenInput(File.DirAssets, "rss.xml") 'This file was added with the file manager.
    parser.Parse(in, "Parser") '"Parser" is the events subs prefix.
    in.Close
 

Attachments

  • XmlSax.zip
    10 KB · Views: 6,260
Last edited:

Erel

B4X founder
Staff member
Licensed User
Longtime User
I agree that it is not easy to start with SAX parsing. However after writing a parser or two it is pretty easy. Note that there is no magic and it will be very similar to a parser written in Java.
B4X:
Sub Process_Globals
    Dim parser As SaxParser
    Type Car(name As String, weight As String, hp As String)
    Dim myCars As List
End Sub

Sub Globals
End Sub

Sub Activity_Create(FirstTime As Boolean)
    If FirstTime Then
        parser.Initialize
        Dim in As InputStream
        in = File.OpenInput(File.DirAssets, "test.txt")
        myCars.Initialize
        parser.Parse(in, "cars")
        in.Close
    End If
    Dim c As Car
    c = myCars.Get(2)
    Log(c.name & " " & c.weight & " " & c.hp)
End Sub
Sub cars_StartElement (Uri As String, Name As String, Attributes As Attributes)
    If Name = "car" Then
        Dim c As Car
        c.Initialize
        myCars.Add(c)
    End If
End Sub
Sub cars_EndElement (Uri As String, Name As String, Text As StringBuilder)
    Dim c As Car
    c = myCars.Get(myCars.Size - 1) 'Get a reference to the last car
    Select Name
        Case "name"
            c.name = Text
        Case "weight"
            c.weight = Text
        Case "hp"
            c.hp = "Text"
    End Select
End Sub
 

JogiDroid

Member
Licensed User
Longtime User
Well, you made it look easy, not so confusing like I had it in my mind, thanks, that was helpful putting me to right track!
 

mistermentality

Active Member
Licensed User
Longtime User
I have what may be a dumb question but here goes.

In the example at the start of this thread the code of the xml parsed includes:
B4X:
<item>
            <title>Phone library was updated - V1.10</title>
            <link>http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</link>
            <pubDate>Sun, 12 Dec 2010 09:27:38 GMT</pubDate>
            <guid isPermaLink="true">http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
        </item>

I have been trying to process code which typically includes code like this:
B4X:
 </lst>
- <result name="response" numFound="37" start="0">
- <doc>
  <str name="description">A collection of theatrical trailers for the Basil Rathbone-Nigel Bruce "Sherlock Holmes" films.</str> 
  <str name="identifier">SherlockHolmesTrailers</str> 
  <str name="mediatype">movies</str> 
  <str name="title">Sherlock Holmes Trailers</str> 
  </doc>
- <doc>
  <str name="description">Based on the Sir Authur Conan Doyle story "The Dancing Men", Sherlock Holmes and Dr. Watson are placed in WWII europe to help protect a scientist and his invention from the Nazis. Basil Rathbone .... Sherlock Holmes Nigel Bruce .... Dr. John H. Watson Lionel Atwill .... Professor Moriarty Kaaren Verne .... Charlotte Eberli William Post Jr. .... Dr. Franz Tobel Dennis Hoey .... Inspector Lestrade Holmes Herbert .... Sir Reginald Bailey Mary Gordon .... Mrs. Hudson</str> 
  <str name="identifier">secret_weapon</str> 
  <str name="mediatype">movies</str> 
  <str name="title">Sherlock Holmes and the Secret Weapon</str> 
  </doc>

The problem I am having is that I cannot use the code example given because as Name it reads "str" instead of "str name", I found this by adding a message box to show the current value of Name.

I can successfully parse the xml,and get Names as "doc" for example just cannot get the values for the title or mediatype for example as it always returns just "str".

Is this because unlike the example that used "item" the xml here has spaces "str name", if so would this mean a simple solution would be to replace all occurences of "str name=" so that just the name (eg "title") was left?

I don't know much about the details of xml coding but I have seen example xml files with "string name" rather than "str name", but would this be the issue?

It is not my xml, but returned from the internet archive site at archive.org but seems to be valid xml so am wondering why it is causing me this issue.

Thanks.

Dave
 

jscoulter

Member
Licensed User
Longtime User
I have been playing around with the SAX parser, but it doesnt seem to like XML as complex as the below. I have tried a number of differet things and not really got anywhere, and wondered if anyone had any ideas on how to consume this XML? I am 1/2 thinking of writing a webservice that gets the data on demand and then feeds the data back to my app. as a JSON object to simplify the data structure. I also thought of having the webservice just display a webpage, but I would rather be able to just parse the XML directly etc. via Basic4Android. Here is the XML. I pasted it in from IE to show the "-" to indicate the sub nodes etc. p.s. this is just ONE record :)

- <event publicID="smi:geonet.org.nz/event/3499339g">
<type>earthquake</type>
<preferredOriginID>smi:geonet.org.nz/ori/490832/GROPE</preferredOriginID>
<preferredMagnitudeID>smi:geonet.org.nz/mag/490832/GROPE/ML</preferredMagnitudeID>
- <origin publicID="smi:geonet.org.nz/ori/490832/GROPE">
<type>hypocenter</type>
<evaluationMode>manual</evaluationMode>
<evaluationStatus>confirmed</evaluationStatus>
<referenceSystemID>smi:eek:gc.def.crs/EPSG/4272</referenceSystemID>
- <time>
<value>2011-04-19T01:27:20.030Z</value>
<uncertainty>0.68037</uncertainty>
</time>
- <latitude>
<value>-37.59959</value>
<uncertainty>0.06618</uncertainty>
</latitude>
- <longitude>
<value>176.18645</value>
<uncertainty>0.11012</uncertainty>
</longitude>
- <depth>
<value>188.1572</value>
<uncertainty>7.9337</uncertainty>
</depth>
<depthType>from location</depthType>
- <quality>
<azimuthalGap>250</azimuthalGap>
<minimumDistance>109.53</minimumDistance>
<maximumDistance>251.13</maximumDistance>
<usedPhaseCount>14</usedPhaseCount>
<usedStationCount>10</usedStationCount>
<standardError>0.30418</standardError>
</quality>
</origin>
- <magnitude publicID="smi:geonet.org.nz/mag/490832/GROPE/ML">
- <mag>
<value>3.421</value>
<uncertainty>0.341</uncertainty>
</mag>
<type>ML</type>
<stationCount>6</stationCount>
<originID>smi:geonet.org.nz/ori/490832/GROPE</originID>
</magnitude>
</event>
 

jscoulter

Member
Licensed User
Longtime User
sure.see attached. Its an XML file in a zip because its 47kb and the uploader for this site only allows 19 or something.

Jeremy
 

Attachments

  • data.zip
    4.5 KB · Views: 654
Last edited:

Erel

B4X founder
Staff member
Licensed User
Longtime User
Here is an example which parses several fields of your XML. It shouldn't be difficult to parse the other fields. It works by checking the parents list to find the current branch.
B4X:
'Activity module
Sub Process_Globals
    Type Event (eventId As String _
                ,eventType As String _
                ,originId As String _
                ,magnitudeId As String _
                ,descriptionText As String _
                ,descriptionType As String _
                ,originType As String _
                ,originTimeValue As String _
                ,originTimeUncertainty As Float _
                ,originLatitudeValue As Float _
                ,originLatitudeUncertainty As Float _
                ,originAzimuthalGap As Float _
                ,magnitudeType As String _
                ,magnitudeValue As Float _
                ,magnitudeUncertainty As Float)
    Dim parser As SaxParser
    Dim ListOfEvents As List
    Dim currentEvent As Event
End Sub

Sub Globals

End Sub

Sub Activity_Create(FirstTime As Boolean)
    If FirstTime Then
        parser.Initialize
    End If
    ListOfEvents.Initialize
    Dim in As InputStream
    in = File.OpenInput(File.DirAssets, "data.xml")
    parser.Parse(in, "parser")
End Sub

Sub parser_StartElement (Uri As String, Name As String, Attributes As Attributes)
    If name = "event" Then
        Dim currentEvent As Event
        currentEvent.Initialize
        currentEvent.eventId = Attributes.GetValue2("", "publicID")
        ListOfEvents.Add(currentEvent)
    End If
End Sub

Sub parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    If parser.Parents.IndexOf("origin") > -1 Then
        Select Name
            Case "type"
                currentEvent.originType = Text.ToString
            Case "azimuthalGap"
                currentEvent.originAzimuthalGap = Text.ToString
        End Select
        If parser.Parents.IndexOf("latitude") > - 1 Then
            Select Name
                Case "value"
                    currentEvent.originLatitudeValue = Text.ToString
                Case "uncertainty"
                    currentEvent.originLatitudeUncertainty = Text.ToString
            End Select
        Else If parser.Parents.IndexOf("time") > -1 Then
            Select Name
                Case "value"
                    currentEvent.originTimeValue = Text.ToString
                Case "uncertainty"
                    currentEvent.originTimeUncertainty = Text.ToString
            End Select
        End If
    Else If parser.Parents.IndexOf("magnitude") > -1 Then
        Select Name
            Case "value"
                currentEvent.magnitudeValue = Text.ToString
            Case "uncertainty"
                currentEvent.magnitudeUncertainty = Text.ToString
            Case "type"
                currentEvent.magnitudeType = Text.ToString
        End Select
    Else 'main event nodes
        Select Name
            Case "type"
                currentEvent.eventType = Text.ToString
            Case "preferredOriginID"
                currentEvent.originId = Text.ToString
            Case "preferredMagnitudeID"
                currentEvent.magnitudeId = Text.ToString
        End Select
    End If
End Sub
 

jscoulter

Member
Licensed User
Longtime User
Oh...thanks for that Erel. I wasnt expecting you to "do" anything :icon_clap:
I was getting carried away and having lists within lists :( you way is a LOT simpler than mental code thats for sure :)

So thanks again. Jeremy
 

Inman

Well-Known Member
Licensed User
Longtime User
The xmlsax library was working fine for me on this feed, until today. I now get "ExpatParser$ParseException". The reason is the 2nd item (currently) in the feed, titled - "LG Optimus Pro and Optimus Net go official, €180/€200 respectively". And the error is caused by the Euro(€) symbol.

Will you please take a look?
 

Inman

Well-Known Member
Licensed User
Longtime User
Thanks for the tip. Took me a while to figure out. I originally tried to put the TextReader value back into the inputstream :) Finally I realised Parser has a Parser.Parse2() which accepts TextReader value itself. Works great now.
 

MikieK

Member
Licensed User
Longtime User
Help

:sign0085:
Hi I am trying to make a music player that uses the phone's shared music preferences.
my code so far (minus the actual music playing) is:
B4X:
'Activity module
Sub Process_Globals
   Dim parser As SaxParser
End Sub

Sub Globals
Dim seekpos As Long
Dim curpos As Int 
Dim shufflemode As Int 
Dim repeatmode As Int
Dim history As String
Dim queue As String
Dim historyflag As Boolean:historyflag = False
Dim queueflag As Boolean:queueflag = False
End Sub

Sub Activity_Create(FirstTime As Boolean)
   If FirstTime Then
      parser.Initialize
   End If
   'parse the xml file
   Dim in As TextReader 
   in.Initialize(File.openinput("", "data/data/com.android.music/shared_prefs/Music.xml"))
   parser.Parse2(in, "Parser")
   in.Close
   Msgbox(history & CRLF & queue,"")
End Sub 

Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)
   For i = 0 To Attributes.Size -1
      Select Attributes.GetValue(i) 
         Case "seekpos"  
            For i = 0 To Attributes.Size -1
               If Attributes.Getname(i) ="value" Then 
                  seekpos = Attributes.GetValue(i)
                  Exit
               End If
            Next
         Case "curpos"
            For i = 0 To Attributes.Size -1
               If Attributes.Getname(i) ="value" Then 
                  curpos = Attributes.GetValue(i)
                  Exit
               End If
            Next
         Case "shufflemode"
            For i = 0 To Attributes.Size -1
               If Attributes.Getname(i) ="value" Then 
                  shufflemode = Attributes.GetValue(i)
                  Exit
               End If
            Next
         Case "repeatmode"
            For i = 0 To Attributes.Size -1
               If Attributes.Getname(i) ="value" Then 
                  repeatmode = Attributes.GetValue(i)
                  Exit
               End If
            Next
         Case "history"
            historyflag = True
         Case "queue"
            queueflag = True
      End Select
   Next
End Sub

Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
   If historyflag Then
      history = text.ToString
      historyflag = False
   End If
   If queueflag Then
      queue = text.ToString
      queueflag = False
   End If   
End Sub

the Music.xml file looks like this:
B4X:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<map>
[INDENT]<string name="history">95;d01;dd;</string> 
<string name="queue">6c;121;7c1;06;</string> 
<int name="curpos" value="597"/> 
<int name="cardid" value="808663139"/> 
<int name="shufflemode" value="1"/> 
<long name="seekpos" value="71524"/> 
<int name="repeatmode" value="0"/> [/INDENT]
</map>

the issue i'm getting is instead of getting "6c;121;7c1;06;" in queue or "95;d01;dd;" in history, nothing is being returned. Any ideas?

Forgive my probably not very good coding I'm a :sign0104:
 

MikieK

Member
Licensed User
Longtime User
Thanks

Hi Erel,
Firstly, I want to say thankyou for such a quick reply
B4X:
Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)
         Select Attributes.GetValue2("", "name")
         Case "seekpos"  
            seekpos = Attributes.GetValue2("", "value")
         Case "curpos"
            curpos = Attributes.GetValue2("", "value")
         Case "shufflemode"
            shufflemode = Attributes.GetValue2("", "value")
         Case "repeatmode"
            repeatmode = Attributes.GetValue2("", "value")
         Case "history"
            historyflag = True
         Case "queue"
            queueflag = True
      End Select
End Sub
Works a treat!
However, the issue doesn't really concern that. I have already played around with the debugger, I put a message box in the Parser_EndElement sub:
B4X:
If historyflag Then
msgbox("","")
      history = text.ToString
      historyflag = False
   End If
I pressed pause when this was shown, after I pressed OK on the phone, the debugger showed that the text variable was empty, the uri variable was empty and the name variable contained "string".
I'm expecting that the string being returned will be very long (ie. refering to 4gigs of music files) could this be an issue?
 
Last edited:

MikieK

Member
Licensed User
Longtime User
My project

Firstly, you may or may not have the Music.xml file I'm reffering to, (I haven't made allowances for that yet), but here is my project and my Music.xml file from my phone.
Actually, just realised that there is no text in the history node, but I get the same problem for the queue node.
 

Attachments

  • XMLparsertest.zip
    10 KB · Views: 618
  • Music.zip
    1.4 KB · Views: 531
Last edited:
Top