Android Tutorial XML Parsing with the XmlSax library

It is simpler to parse XML with Xml2Map class: https://www.b4x.com/android/forum/threads/b4x-xml2map-simple-way-to-parse-xml-documents.74848/

The XmlSax library provides an XML Sax parser.
This parser sequentially reads the stream and raises events at the beginning and end of each element.
The developer is responsible to do something useful with those events.

There are two events:
B4X:
StartElement (Uri As String, Name As String, Attributes As Attributes)
EndElement (Uri As String, Name As String, Text As StringBuilder)
The StartElement is raised when an element begins. This event includes the element's attributes list.
EndElement is raised when an element ends. This event includes the element's text.

In this example we will parse the forum RSS feed. RSS is formatted using XML.
A simplified example of this RSS is:
B4X:
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
    <channel>
        <title>Basic4ppc  / Basic4android - Android programming</title>
        <link>http://www.b4x.com/forum</link>
        <description>Basic4android - android programming and development</description>
        <ttl>60</ttl>
        <image>
            <url>http://www.b4x.com/forum/images/misc/rss.jpg</url>
            <title>Basic4ppc  / Basic4android - Android programming</title>
            <link>http://www.b4x.com/forum</link>
        </image>
        <item>
            <title>Phone library was updated - V1.10</title>
            <link>http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</link>
            <pubDate>Sun, 12 Dec 2010 09:27:38 GMT</pubDate>
            <guid isPermaLink="true">http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
        </item>
        ...MORE ITEMS HERE
    </channel>
</rss>
The first line is part of the XML protocol and is ignored.
On the second line the StartElement event will be raised with "Name = rss" and the attributes will include the "version" field.
The EndElement of the rss element will only be called on the last line: </rss>.

We will populate a list view with all items parsed from an offline file. When the user will press on an item we will open the browser with the relevant link.
Every item represents a forum thread.

xmlsax_1.png


For each item we are interested in two values. The title and the link.
The SaxParser object includes a handy list that holds the names of all the current parents elements.
This is useful as it will help us find the "correct" 'title' and 'link' elements. The correct elements are the ones under the 'item' element.

The parsing code in this case is pretty simple:
B4X:
Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)

End Sub
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    If parser.Parents.IndexOf("item") > -1 Then
        If Name = "title" Then
            Title = Text.ToString
        Else If Name = "link" Then
            Link = Text.ToString
        End If
    End If
    If Name = "item" Then
        ListView1.AddSingleLine2(Title, Link) 'add the title as the text and the link as the value
    End If
End Sub
Title and Link are global variables.
We are only using EndElement events in this program.
First we check if we are inside an 'item' element. If this is the case we check the actual element name and save it if it is 'title' or 'link'.

If the current element is 'item' it means that we are done parsing an item.
So we add the data collected to the list view.

We are using ListView.AddSingleLine2. This method receives two values. The first is the item text and the second is the value that will return when the user will click on this item. In this case we are storing the link as the return value.

Later we will use it to open the browser:
B4X:
Sub ListView1_ItemClick (Position As Int, Value As Object)
    StartActivity(PhoneIntents1.OpenBrowser(Value)) 'open the brower with the link
End Sub
The code that initiated the parsing is:
B4X:
    Dim in As InputStream
    in = File.OpenInput(File.DirAssets, "rss.xml") 'This file was added with the file manager.
    parser.Parse(in, "Parser") '"Parser" is the events subs prefix.
    in.Close
 

Attachments

  • XmlSax.zip
    10 KB · Views: 6,244
Last edited:

bluedude

Well-Known Member
Licensed User
Longtime User
Hi,

I'm reading results from a hosted Json file and want to parse it. Below is the code I use but it does not work. xml is the string which contains the xml. I'm getting a nullpointer exception.

Dim In As InputStream
Dim data() As Byte = xml.GetBytes("UTF8")
In.InitializeFromBytesArray(data, 0, data.Length)
xmlParser.Parse(In, "parseXML")
 

paolofi

Member
Licensed User
Longtime User
Hi, I have a problem parsing an XML stream from my server.
If I parse the stream from URL the Text StringBuilder parameters of Sub Parser_EndElement contains already all the full stream character at the first cycle, if I copy and paste the XML stream in a text file and read it with File.OpenInput, the parser works well.
I use HttpUtils2 to connecting, I verified the XML stream and I believe is good.

Can you have any suggestions?

Thanks in advance.

Paolo


B4X:
Sub JobDone (job As HttpJob)

Dim In As InputStream

If job.Success = True Then
        'this doesn't works well
        In = job.GetInputStream
        parserCustomer.Parse(In, "Parser")
     
        'this works!
        In = File.OpenInput(File.DirAssets, "testxml.xml")

        In.Close    
End If

job.Release

End Sub

Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)
End Sub

Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)

Dim Result
Dim FieldsList

'Text contains the full stream no parsed.

If parserCustomer.Parents.IndexOf("string") > -1 Then
    If Name = "Result" Then
        Result = Text.ToString
    Else If Name = "FieldsList" Then
        FieldsList = Text.ToString
    End If
End If

End Sub

The testxml.xml file (captured from server stream)
B4X:
<string xmlns="http://MyDomain.com/">
<Result>OK|</Result><FieldsList>|counter|id|</FieldsList><Record>|24749|1204|</Record><Record>|40852|8136|</Record><Record>|41192|2748|</Record><Record>|41194|4516|</Record><Record>|41195|7772|</Record><Record>|41196|8915|</Record><Record>|41197|4343|</Record><Record>|41198|9999|</Record><Record>|41199|9999|</Record><Record>|41200|815|</Record><Record>|41201|9154|</Record>
</string>
 

paolofi

Member
Licensed User
Longtime User
What is the output of Log(Job.GetString)?

I used this:
B4X:
Sub JobDone (job As HttpJob)

Log("PROC:JobDone")

If job.Success = True Then
    Log("Job.GetString = " & job.GetString)
    Log("job.GetString End") 
End If

job.Release

End Sub

and the result was:
B4X:
PROC:JobDone
PROC:JobDone
Job.GetString = <?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://MyDomain.com/">&lt;Result&gt;OK|&lt;/Result&gt;&lt;FieldsList&gt;|counter|id|&lt;/FieldsList&gt;&lt;Record&gt;|24749|1204|&lt;/Record&gt;&lt;Record&gt;|40852|8136|&lt;/Record&gt;&lt;Record&gt;|41192|2748|&lt;/Record&gt;&lt;Record&gt;|41194|4516|&lt;/Record&gt;&lt;Record&gt;|41195|7772|&lt;/Record&gt;&lt;Record&gt;|41196|8915|&lt;/Record&gt;&lt;Record&gt;|41197|4343|&lt;/Record&gt;</string>
Job.GetString = <?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://MyDomain.com/">&lt;Result&gt;OK|&lt;/Result&gt;&lt;FieldsList&gt;|counter|id|&lt;/FieldsList&gt;&lt;Record&gt;|24749|1204|&lt;/Record&gt;&lt;Record&gt;|40852|8136|&lt;/Record&gt;&lt;Record&gt;|41192|2748|&lt;/Record&gt;&lt;Record&gt;|41194|4516|&lt;/Record&gt;&lt;Record&gt;|41195|7772|&lt;/Record&gt;&lt;Record&gt;|41196|8915|&lt;/Record&gt;&lt;Record&gt;|41197|4343|&lt;/Record&gt;</string>
Job.GetString End
Job.GetString End

The LOG repeats twice despite debugging i see passing on it only 1 time, I don't know if this is right.

The code pass from here when it call .Getstring:
B4X:
'Returns the response as a string encoded with UTF8.
Public Sub GetString As String
    Return GetString2("UTF8")
End Sub

'Returns the response as a string.
Public Sub GetString2(Encoding As String) As String
    Dim tr As TextReader
    tr.Initialize2(File.OpenInput(HttpUtils2Service.TempFolder, taskId), Encoding)
    Dim res As String
    res = tr.ReadAll
    tr.Close
    Return res
End Sub

This can help?

Thanks.
 

Erel

B4X founder
Staff member
Licensed User
Longtime User

paolofi

Member
Licensed User
Longtime User
Ignore the duplicate logs (http://www.b4x.com/android/forum/threads/bridge-log-duplicates-output.21656/#post-162648).

HttpUtils2 returns the string as it comes from the server.
The server is escaping the inner elements nodes for some reason. This causes the inner elements to be treated as a single string.

You can replace &lt; with < and &gt; with > before you parse the string.

Thanks for your reply,
now i'm investigating about the string escaping, meanwhile this solve my problem:

B4X:
Dim s As String
Dim In As InputStream

If job.Success = True Then
        s=job.GetString
        s=s.Replace("&lt;","<")
        s=s.Replace("&gt;",">")
     
        In=StringToInputStream(s)
            
        parserCustomer.Parse(In, "Parser")
End If

Sub StringToInputStream (s As String) As InputStream
   Dim In As InputStream
   Dim data() As Byte = s.GetBytes("UTF8")
   In.InitializeFromBytesArray(data, 0, data.Length)
   Return In
End Sub

Many thanks Erel.
 

BedDweller

Member
Licensed User
Longtime User
Hello, not really a B4A question, but its concerning this lib... I'm currently making two programs which both read a XML file, I have used this lib in B4A and it works like a charm, however the other program is made in Visual Studio's and I was wondering if there is a lib for VS that works like this one (I've done alot of searching, but most are ether Dom or more complex) Thanks
 

holdemadvantage

Active Member
Licensed User
Longtime User
Hi, maybe this is a stupid question...

I use this lib to parse an online xml, all ok but i cannot understand how to check if all elements of my xml are parsed, is there a way to do it?

I explain my problem
During parsing i fill a list of xml elements then at the end of parsing i have to perform other action if this list is fill of all xml elements, how can i check if parsing is finished?
 

almontgreen

Active Member
Licensed User
Longtime User
I am just starting with B4J and trying to parse xml with this line in the xml:

<media:image url="http://www.6a3d.com/folder/images/gallery/Untitled-1.jpg" type="image/jpeg" height="337" width="450"/>

I only want the link. I don't want anything else like height etc. When I use the code:

B4X:
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder, Attributes As Attributes)
    If parser.Parents.IndexOf("item") > -1 Then
         If Name = "media:image" Then
             Log(Text.ToString)
         end if
'more stuff...
It is blank. I don't understand how to use Attributes.GetValue Could you please provide an example? I'm very confused. Thanks!!!
 

almontgreen

Active Member
Licensed User
Longtime User
There is no such event as the one you wrote.
The Attributes parameter is only available in Parser_StartElement.
Thanks. Tried both Parser_StartElement and Parser_EndElement and even with the right parameters, so far, I can't figure out how to parse out just the url.
 

omidaghakhani1368

Well-Known Member
Licensed User
Longtime User
i can find end of parse sax.
i check item name in end_element with "channer" expersion and i know than it is end if parse
 
Top