Android Question Sample how-to properly handle encoding while using XmlSax library

jcredk

Member
Licensed User
Longtime User
As published in the other post that Erel asks me to move:

Hi all,
This library is great ... I am using it inside a kind of RSS Reader/Concentrator I am doing.

Despite I read all pages of this post I still have issues with encoding on some flows only.

I reuse the sample at the beginning of the post so you can see the behavior I have ...
Basically the application downloads the RSS feed of a french newspaper (text will contain french accentuated letters éèà ...) that are not properly handled.

You will see in the attachement below the test application highly inspired from top post sample ... Then accentuated letters are replaced with question mark inside a diamond (cf. ScreenShot) ...!

I tried many things with an intermediate stream to handle encoding but without success ...
... Any help from those that already solved this issue will be greatly appreciated !

Thanks
 

Attachments

  • Screenshot_2014-03-18-09-39-01.png
    Screenshot_2014-03-18-09-39-01.png
    165.2 KB · Views: 211
  • XmlSaxTest.zip
    1.1 KB · Views: 217

Erel

B4X founder
Staff member
Licensed User
Longtime User
The problem is here:
B4X:
Sub JobDone(job As HttpJob)

   Dim out As TextWriter
   out.Initialize(File.OpenOutput(File.DirDefaultExternal, "test.txt",False))
   out.Write(job.GetString())
   out.Close
   'parse the xml file   
   Dim In As InputStream
   In = File.OpenInput(File.DirDefaultExternal, "test.txt")
   parser.Parse(In, "Parser")
   In.Close

End Sub

Job.GetString assumes that the encoding used is UTF8. In this case the correct encoding is iso-8859-1 (you can see it in the page source).

Note that there is no need to save the string to a file and then read it from a file.

B4X:
Sub JobDone(job As HttpJob)
 If job.Success Then
   Dim tr As TextReader
   tr.Initialize2(job.GetInputStream, "iso-8859-1")
   parser.Parse2(tr, "Parser")
   tr.Close
 Else
  Log("Error: " & job.ErrorMessage)
 End If
 job.Release
End Sub
 
Upvote 0

jcredk

Member
Licensed User
Longtime User
Thanks Erel,

Modified sample works fine and I see the point ... My writing to file was a try to debug ... but may be worst :)

Nevertheless, as I said at the top of the post, my app is reading/concentrating dozens of RSS feeds.
Some feeds have "<?xml version="1.0" encoding="iso-8859-1"?>", while some have "<?xml version='1.0' encoding='UTF-8'?>", and other encodings ...
What I would like is a way to "capture" this encoding before "tr.Initialize2" to make the function independant from the xml source encoding ...

Do you think there is a nice way to do this ?

Thanks in advance
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
The encoding should be sent in the Http response. You can get it by using HttpUtils2 code modules (instead of the library).

It will be available inside hc_ResponseSuccess sub.
Add these two lines to save the encoding:
B4X:
Dim job As HttpJob = TaskIdToJob.Get(TaskId)
job.Tag = Response.ContentEncoding

Make sure to handle the case where no encoding header was sent (use UTF8 by default).
 
Upvote 0

jcredk

Member
Licensed User
Longtime User
I will do that test, for sure ...!
Just one stupid dummy question Erel: Where could I find the dowload of "HttpUtils2 code modules (instead of the library)" please ?
Thanks in advance :)
 
Upvote 0

jcredk

Member
Licensed User
Longtime User
Sorry it's me again ... :(
I tried your proposition above but what I implemented still fails

There is a
B4X:
java.lang.NullPointerException
on the on
B4X:
Response.ContentEncoding
I attached the new tests project where I did the changes (code modules, a button to switch between 2 RSSs with different encoding)
It is giving a result as I put a try/catch (that forces UTF-8 in the catch) otherwise it crashes systematically ...

Any advice please?
Thanks,
 

Attachments

  • XmlSaxTest2.zip
    8.5 KB · Views: 218
Upvote 0
Top