apostrophe as square - UTF-8 encoding / sqllite


I'm sure someone else will come across this problem so I thought it a good idea to ask for some help.

Basically I'm loading an XML file into a sqlite database, and rendering the data as listviews. The issue is with character encoding, specifically apostrophes are being rendered as squares within the app.

Now I'm aware of the problem i.e. how the characters have been encoded , but I have checked/ verified the following:-

  • the XML is encoded with utf-8 encoding
  • sqllite is, as I understand it, utf-8 by default
  • I've tried replacing the apostrophe character with html markup etc but this doesnt make any difference

I'm assuming the problem is either with the sqllite database having the wrong character encoding set, or the xml it starts off with having the wrong encoding, but both are UTF-8 as far as I can tell.

Can anyone clarify what I need to do to fix this? Please excuse my ignorance and hope you can help!


I am loading an HTML file into a webview. The HTML is created by saving a Word Document into .htm format and then loading it into direct assets for use within my app.
All web viewers on the PC view the file perfectly, but the webview in b4a, shows black diamonds with a question mark within throughout the document.
Is there an encoding capability or requirement to handle this?

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Title:<span
style='mso-spacerun:yes'>  </span></b>This is a test html sentence, 
this text is being displayed within a webview</p>
Word adds the text <span style='mso-spacerun:yes'> </span></b>
this seems to be the source of the black diamonds. Within the Word doc there are two blank spaces.
In either case, when I view the html document in Firefox for example, it looks fine even with the embedded span. In the Android webview it has the diamonds.
Any advice is greatly appreciated.
I have created an example html file that has the problems of the black diamond with embedded question mark for you. (see attached screen image)
It will load properly with loadurl, but won't load with loadhtml (after the file is read into memory)
This does not work:
Dim HTML As String = File.GetText(File.DirAssets, "test.htm") 
This does work:
does work. (folder is replaced for my actual location on the android, basically File.DirRootExternal and another folder)

Thanks in advance for your help.


Tried an example inserting the paragraph you gave, works for me. Can you give it a try?


Thanks for the response.
However, the HTML file you tested with does not contain the critical items.
Try it with the HTML file I attached to the previous post.
Maybe you can cut and paste this. Sorry for the inconvenience, but the file fails to attach as invalid.

<html xmlns:v="urn:schemas-microsoft-com:vml"

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 14">
<meta name=Originator content="Microsoft Word 14">
<link rel=File-List href="APGAR_Consent%202012-2013%20new_files/filelist.xml">
<link rel=Edit-Time-Data
<title>General Information About This Study and the Researchers</title>


<body lang=EN-US style='tab-interval:.5in'>

<div class=WordSection1>
<p class=MsoNormal align=center style='text-align:center;tab-stops:1.0in'><i
style='mso-bidi-font-style:normal'><span style='font-size:16.0pt;color:blue'><o:p>&nbsp;</o:p></span></i></p>

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Study Title:<span
style='mso-spacerun:yes'>  </span></b>test test test, a
randomized controlled trial</p>

<p class=MsoNormal><i style='mso-bidi-font-style:normal'><o:p>&nbsp;</o:p></i></p>

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>More test stuff<o:p></o:p></b></p>

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'><o:p>&nbsp;</o:p></b></p>

<p class=MsoNormal style='tab-stops:right 5.5in'><b style='mso-bidi-font-weight:
normal'>Name of Principal Investigator on this Study:<span
style='mso-spacerun:yes'>  </span></b>NAME </p>

<p class=MsoNormal style='tab-stops:right 5.5in'><o:p>&nbsp;</o:p></p>

<p class=MsoNormal style='margin-bottom:6.0pt'><b style='mso-bidi-font-weight:
normal'>MORE test stuff: <span style='mso-spacerun:yes'>   </span><a
name=facilityname><span style='mso-spacerun:yes'>  </span></a><span
style='mso-spacerun:yes'>    </span><o:p></o:p></b></p>

<p class=MsoNormal style='margin-bottom:12.0pt'>Please sign and date to show
that you have read all of the above guidelines.<span style='mso-spacerun:yes'> 
</span>Please do not sign unless you have read this entire consent form.<span
style='mso-spacerun:yes'>  </span>If you do not want to sign, you don’t have
to, but if you don’t you cannot participate in this research study. </p>

<p class=MsoNormal style='margin-top:0in;margin-right:-40.3pt;margin-bottom:
12.0pt;margin-left:0in;tab-stops:112.5pt 5.5in'><b style='mso-bidi-font-weight:
normal'><u><span style='font-size:14.0pt'>Signature for study participation<o:p></o:p></span></u></b></p>

<p class=MsoNormal style='margin-right:-40.5pt;line-height:13.0pt;mso-line-height-rule:
exactly;tab-stops:112.5pt 5.5in'><a name=surveydate>______________</a><span
style='mso-spacerun:yes'>  </span><span style='mso-tab-count:1'>                                      </span><a
style='mso-spacerun:yes'>    </span></p>

<p class=MsoNormal style='margin-right:-49.5pt;line-height:13.0pt;mso-line-height-rule:
exactly;tab-stops:112.5pt 5.5in 5.75in'>(Date / Time)<span style='mso-tab-count:
1'>               </span>(Printed Name of Participant) – Respondent #<a
name=respondent><span style='mso-tab-count:1'>                     </span></a></p>



I think I don't have to.

I see a meta, tagging windows-1252, where it should use utf-8. Line 7.
I replaced the windows-1252 with utf-8.
no change.

My line of the html looks like this:
<meta http-equiv=Content-Type content="text/html; charset=utf-8">

I appreciate your help. Any advice on the above?
Here's a corrected version.


Thanks so much for all your observations on the UTF-8 issue.
I have discovered that with Word 2010, you can alter the saving of the DOC to HTML and can specify the encoding (as well as eliminate some of the bloat created by Word)

When using Save As;
  • select save as Web Page, Filtered (*.htm; *.html);
  • click TOOLS (next to the Save button)
  • click Web Options
  • click Encoding
  • click Save this document as: Pick Unicode (UTF-8)
After saving, the encoding appears:
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
This will save the document with the proper encoding.
After doing this, my issue with LoadHTML and LoadURL go away.

I've not much experience with msOffice, otherwise I would advice you better and earlier
Usually, I create my rather basic php and html with jEdit.
I don't have this kind of experience either.
My clients insist upon Microsoft Office products, so I am learning.
Again, thanks for your input.
