Android Tutorial XML Parsing with the XmlSax library

It is simpler to parse XML with Xml2Map class: https://www.b4x.com/android/forum/threads/b4x-xml2map-simple-way-to-parse-xml-documents.74848/

The XmlSax library provides an XML Sax parser.
This parser sequentially reads the stream and raises events at the beginning and end of each element.
The developer is responsible to do something useful with those events.

There are two events:
B4X:
StartElement (Uri As String, Name As String, Attributes As Attributes)
EndElement (Uri As String, Name As String, Text As StringBuilder)
The StartElement is raised when an element begins. This event includes the element's attributes list.
EndElement is raised when an element ends. This event includes the element's text.

In this example we will parse the forum RSS feed. RSS is formatted using XML.
A simplified example of this RSS is:
B4X:
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
    <channel>
        <title>Basic4ppc  / Basic4android - Android programming</title>
        <link>http://www.b4x.com/forum</link>
        <description>Basic4android - android programming and development</description>
        <ttl>60</ttl>
        <image>
            <url>http://www.b4x.com/forum/images/misc/rss.jpg</url>
            <title>Basic4ppc  / Basic4android - Android programming</title>
            <link>http://www.b4x.com/forum</link>
        </image>
        <item>
            <title>Phone library was updated - V1.10</title>
            <link>http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</link>
            <pubDate>Sun, 12 Dec 2010 09:27:38 GMT</pubDate>
            <guid isPermaLink="true">http://www.b4x.com/forum/additional-libraries-official-updates/6859-phone-library-updated-v1-10-a.html</guid>
        </item>
        ...MORE ITEMS HERE
    </channel>
</rss>
The first line is part of the XML protocol and is ignored.
On the second line the StartElement event will be raised with "Name = rss" and the attributes will include the "version" field.
The EndElement of the rss element will only be called on the last line: </rss>.

We will populate a list view with all items parsed from an offline file. When the user will press on an item we will open the browser with the relevant link.
Every item represents a forum thread.

xmlsax_1.png


For each item we are interested in two values. The title and the link.
The SaxParser object includes a handy list that holds the names of all the current parents elements.
This is useful as it will help us find the "correct" 'title' and 'link' elements. The correct elements are the ones under the 'item' element.

The parsing code in this case is pretty simple:
B4X:
Sub Parser_StartElement (Uri As String, Name As String, Attributes As Attributes)

End Sub
Sub Parser_EndElement (Uri As String, Name As String, Text As StringBuilder)
    If parser.Parents.IndexOf("item") > -1 Then
        If Name = "title" Then
            Title = Text.ToString
        Else If Name = "link" Then
            Link = Text.ToString
        End If
    End If
    If Name = "item" Then
        ListView1.AddSingleLine2(Title, Link) 'add the title as the text and the link as the value
    End If
End Sub
Title and Link are global variables.
We are only using EndElement events in this program.
First we check if we are inside an 'item' element. If this is the case we check the actual element name and save it if it is 'title' or 'link'.

If the current element is 'item' it means that we are done parsing an item.
So we add the data collected to the list view.

We are using ListView.AddSingleLine2. This method receives two values. The first is the item text and the second is the value that will return when the user will click on this item. In this case we are storing the link as the return value.

Later we will use it to open the browser:
B4X:
Sub ListView1_ItemClick (Position As Int, Value As Object)
    StartActivity(PhoneIntents1.OpenBrowser(Value)) 'open the brower with the link
End Sub
The code that initiated the parsing is:
B4X:
    Dim in As InputStream
    in = File.OpenInput(File.DirAssets, "rss.xml") 'This file was added with the file manager.
    parser.Parse(in, "Parser") '"Parser" is the events subs prefix.
    in.Close
 

Attachments

  • XmlSax.zip
    10 KB · Views: 6,236
Last edited:

PaulR

Active Member
Licensed User
Longtime User
I am parsing a few xml files with no problems, but now when I try another one it constantly fails. When it had non-english characters in it, I was explicitly getting Notepad++ to encode it in UTF-8, but no joy.

It turned out that Firefox didn't recognize it as a valid XML file either. Now I have stripped it of all non-english characters and it loads into Firefox and Serna okay, but it still isn't parsed by SaxParser.

I have attached a tiny test project (also containing the XML file) for anyone to look at. Hopefully there will be something silly I have overlooked there, but I am doing it fine with other xml files so I do not have a clue what is happening.

Thanks for any help!
 

Attachments

  • XmlSaxTest.zip
    6.1 KB · Views: 597

PaulR

Active Member
Licensed User
Longtime User
Hi, thanks for the reply. I turns out that I had tried that and it didn't work... because I was editing the original (not that one) xml file of the same name. :eek:

Thanks again for the pointer!
 

gkumar

Active Member
Licensed User
Longtime User
XML startelement problem

I am getting xml response, I could able to parse and display the items in List view, But when I replaced it scrollview, adding 5 views for one row (panel) in the Xml_Startelement in the runtime. But after adding 140 panel or so (with 5 views in each panel with 2 images, background image etc), it comes out without completing the all the panel display.

Later I found that If I comment the background image adding part to panel then it works fine. Adding background image taking much time and I have around 600 elements. One way I am thinking after adding all the rows, then add the background iamges in a separate method.
Any other solution can I think of?
 
Last edited:

grant1842

Active Member
Licensed User
Longtime User
Xml Url Help

I am new to B4A and I am trying out this code but i can not find in the code were to point to a feed burner xml page.

example i am trying to point to is
http://feeds.feedburner.com/yourfeedhere.
So how can i point get my feedburner link to download the new xml file every 3 mins or so.


THanks for your help.
 
Last edited:

MickFinn

Member
Licensed User
Longtime User
Read in XML String into parser

Prob I'm having is I need to pass an XML string. I have downloaded an html page and extracted XML that is stored in a text box on the page. The XML is now stored in a string variable.

How Can I then pass this xml string into the SaxParser? Can I somehow convert it to an inputstream? httputils is not an option as the xml code is embedded in the html file....
 

elitistnot

Member
Licensed User
Longtime User
How to setup an inputsteam with a URL

Parse2 expects a TextReader not a string.
Instead of saving the result to a string, just pass the InputStream directly to the XML parser.

I get how to setup an inputstream with a file, but how do you set it up with a URL (to a RSS/XML feed for example)?
 

Nyptop

Active Member
Licensed User
Longtime User
I am trying to parse the following XML but I am having some problems because inside each tag is a field I require. Any ideas?

So for example, how could I access the Adresa field?

<main>
<markers>
<marker ID="99" Ime="Šumadija Palas" Adresa="Turgenjevljeva 5, 11000 Beograd, Čukarica " Telefon="+381 11 3555465 " Email="" Website="" Tag="pogledam, bioskop, projekcija, film" lat="44.783730" lng="20.417770" distance="1.1837991203864726" type="result-good"/>
<marker ID="91" Ime="Sava Centar" Adresa="Milentija Popovića 9, Belgrade, Serbia, Novi Beograd" Telefon="011/2206-735 " Email="" Website="www.savacentar.com " Tag="pogledam, bioskop, projekcija, koncert, film, manifestacija" lat="44.809399" lng="20.432180" distance="1.8923497683301873" type="result-good"/>
</markers>
<options currTag="bioskop biosko biosk" pagination="" resNo="2" duz="1"/>
</main>
 

NJDude

Expert
Licensed User
Longtime User
You wil have to do something like this:
B4X:
Sub Parser_StartElement(Uri As String, Name As String, Attributes As Attributes)

    Msgbox("Uri=" & Uri & CRLF & "Name=" & Name & CRLF & "Attrib=" & Attributes.GetValue(2), "")

End Sub
 

sorex

Expert
Licensed User
Longtime User
I'm wondering if this library is still under development
for any possible additional features?

It also appears to be a top > down processing library aswell.

Imagine an xml file like this...

<images>
<image name=name1 url=image1.jpg>
<image name=name1 url=image2.jpg>
<image name=name1 url=image3.jpg>
</images>

<thumbs>
<thumb name=name1 url=image1.jpg>
<thumb name=name1 url=image2.jpg>
<thumb name=name1 url=image3.jpg>
</thumbs>


1. if you want to display the thumbs first you're forced
to "pass" or process the images first?

2. random access seems impossible without copying everything
to another array first?

a code like this (Flash based tho) could do it without the copy.

node=myxml.childNodes[1].childNodes
randomurl=node[int(Math.random()*node.lenght)]

3. as shown in the example above I have the flexibility to fetch the amount of nodes
(images or thumbs) which is easy to display a counter, randomize, or loop through
all or the first 3 of them without the need to process it completely

4. those if/then and select/case examples are nice but with this xml file
how would you make a difference between the name attribute of <image> & <thumb> ?

you'll need extra checks to get this solved.


so is there any hope that any of these features will end up in the library one day?
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
This library implements a SAX parser. This is how SAX parsers work. SAX parsers have better performance, and once you get used to their flow they are quite simple to work with.

You are correct that in this case you need to store the images and then the thumbs. Only then you should start process whatever needs to be processed.
 

pixelpop

Active Member
Licensed User
Longtime User
NJDude:

Based on your response to Nyptop, how would you suggest I parse this response in order to assign ArrivalDate to a variable:

<?xml version="1.0" encoding="UTF-8"?>
<FlightHistoryGetRecordsResponse xmlns="http://pathfinder-xml/FlightHistoryService.xsd">
<FlightHistory DepartureAirportTimeZoneOffset="-8" ArrivalAirportTimeZoneOffset="-5" ArrivalDate="2013-01-29T22:15:00.000" ArrivalGate="D15" ArrivalTerminal="N">
<Airline AirlineCode="AA" IATACode="AA" ICAOCode="AAL" Name="American Airlines"/><Origin AirportCode="LAS" FAACode="LAS" IATACode="LAS" ICAOCode="KLAS" Name="McCarran International Airport"/><Destination AirportCode="MIA" FAACode="MIA" IATACode="MIA" ICAOCode="KMIA" Name="Miami International Airport"/></FlightHistory></FlightHistoryGetRecordsResponse>

When I run the Msgbox example, it returns "FlightHistoryGetRecordsResponse" as the Name, but never gets to the FlightHistory tag. ArrivalDate is the third (index 2) value in the FlightHistory tag. Thanks!
 
Last edited:
Top