Android Question loadCSV adding "ZERO WIDTH NO-BREAK SPACE" with the unicode 65279 to the beginning of the string

kris A

Member
Licensed User
Longtime User
i have spend hours looking for the problem. I have this code that

loadcsv:
            Dim strSQL As String = "INSERT into tbldice (category,question) VALUES "
            For Each myrow() As String In su.LoadCSV (Dir,FileName,";")
                strSQL = strSQL & "("
                For Each strData As String In myrow
                    strSQL = strSQL & "'" & strData & "',"
                Next
                strSQL = strSQL.SubString2(0,strSQL.Length-1)
                strSQL = strSQL & "),"
            Next
            strSQL = strSQL.SubString2(0,strSQL.Length-1) & ";"
       
            Main.sql.execnonquery(strSQL)

then when i did a "SELECT DISTINCT category from tbldice" i found out that loadcsv added a strange and undisplayed unicode at the beginning of the data. When i copied and pasted the log to dbfiddle i found out its adding "ZERO WIDTH NO-BREAK SPACE" with the unicode 65279 to the beginning of the string.

you can take a look at the testdb and run the "SELECT DISTINCT category from tbltest", you will notice that category "aa" is duplicated because of the unseen unicode.

can somebody explain whats going on?
 

Attachments

  • test.zip
    341 bytes · Views: 91

emexes

Expert
Licensed User
loadcsv added a strange and undisplayed unicode at the beginning of the data.

I think you will find that it is already present in the file, and is not being added by loadcsv.

i found out its ... "ZERO WIDTH NO-BREAK SPACE" with the unicode 65279

I am surprised that it wasn't made clear that unicode 65279 (0xFEFF) is also known as:

 
Upvote 0

kris A

Member
Licensed User
Longtime User
the file is generated by excel and i cant find the extra unicode character in the file. please look at the csv file attached. I am copying it from my gdrive to my avd to test it. Not sure if that has anything to do with the extra character Infront but i doubt it
 

Attachments

  • tbltest.zip
    149 bytes · Views: 105
Upvote 0

emexes

Expert
Licensed User
cant find the extra unicode character in the file. please look at the csv file attached

With pleasure; it is a nice change to receive sufficient data to diagnose a problem ? :

CMD:
W:\temp> \util\dump tbltest.csv

0000: EF BB BF 61 61 3B 74 31 0D 0A 61 61 3B 74 32 0D   ■■■aa;t1claa;t2c    0: 239 187 191  97  97  59 116  49  13  10  97  97  59 116  50  13
0010: 0A 44 69 6E 67 20 31 3B 64 31 0D 0A 44 69 6E 67   lDing 1;d1clDing   16:  10  68 105 110 103  32  49  59 100  49  13  10  68 105 110 103
0020: 20 31 3B 64 32 0D 0A                               1;d2cl            32:  32  49  59 100  50  13  10

W:\temp>

From the aforelinked Wikipedia page:

The UTF-8 representation of the BOM is the (hexadecimal) byte sequence EF BB BF
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
I just discovered that you can see the three bytes at the start of the file using:

CMD:
W:\temp> type tbltest.csv
´╗┐aa;t1
aa;t2
Ding 1;d1
Ding 1;d2

W:\temp>
 
Upvote 0

kris A

Member
Licensed User
Longtime User
oh wow thanks that's a revelation. Excel actually added those characters. how do i make sure excel does not do that?
 
Upvote 0

emexes

Expert
Licensed User
Excel actually added those characters. how do i make sure excel does not do that?

That is a good question. I don't have Excel, only OpenOffice Calc, which doesn't show any option about whether to include BOM or not.

1709617693418.png


I saved anyway, and the resulting csv file did not have BOM.
 
Upvote 0

emexes

Expert
Licensed User
how do i make sure excel does not do that?

Two immediately-obvious and (relatively) simple solutions are:

1/ su.LoadCSV into a Map variable, and then for the first field of the first row, do a .Replace(Chr(0xFEFF), "") on it
2/ load the file into a String variable, delete the first character if it is Chr(0xFEFF), write it back to a temporary file, then su.LoadCSV from that

with the bonus that solving the issue at your end means that your users don't need to do anything special when exporting data to your app.
 
Last edited:
Upvote 0

kris A

Member
Licensed User
Longtime User
Lol I am on the arse-end side of the world, and presumably you are not, so from your perspective I am the nightshift support person. Things are pretty quiet here. ?
oh you are from aussie, the land down under as i gather from your profile. I am in your country's next door neighbor, Indonesia. You are night support for a company i presume. since you have the label expert for b4x, that means you have been using the language for a while. Thank you so much
 
Upvote 0

emexes

Expert
Licensed User
I am in your country's next door neighbor, Indonesia.

In that case then we are in the same day/night phase.

The bit about nightshift is a joke for people in continents other than Australia or Asia - they are often surprised to be getting reasonably good support from the forum in the middle of the night their time eg 3am.

Which island are you on? We visited Lombok just before covid became popular, it was very nice.

I didn't dare drive, though ? which is pretty funny because in a previous life I was a driving instructor (here in Australia).
 
Upvote 0

kris A

Member
Licensed User
Longtime User
i am in the capital. Jakarta. Lombok or labuan bajo is really nice. lots of aussies here. dont drive. taxis and ride hails are plenty and affordable.
btw you are full time support for anywhere software(b4x) or for another company not related to b4x?
 
Upvote 0
Top