Android Code Snippet [B4X] Reading a UTF 8 file that might have BOM

UTF8 text files might include a BOM character. You can test it by using a decent text editor such as Notepad++.

Always prefer to work with text files encoded with UTF8 without BOM.

This code reads a UTF8 text file and removes the BOM character if needed:
B4X:
Sub ReadUTF8FileMaybeWithBOM (Dir As String, Name As String) As String
   Dim s As String = File.ReadString(Dir, Name)
   If s.StartsWith(Chr(0xFEFF)) Then
       s = s.SubString(1)
   End If
   Return s
End Sub
 

swChef

Active Member
Licensed User
Longtime User
Sharing this here as an aid to others, it is directly related to the OP content.
Today I had a case where a string sourced elsewhere than from reading a local File was prefixed with the BOM (happened to be that a remote app read a File and transferred it to a 'local' app). So I split Erel's solution into two Subs and used both of these Subs in a few places. Oddly I found one case where there were two marks prefixed on a string, if you were wondering why there is a counter. The logging helps when there are many processed strings, allowing to identify the source and eventually clean that up.
B4X:
Sub ReadUTF8FileMaybeWithBOM (Dir As String, Name As String) As String
    Return RemovePossibleBOM(File.ReadString(Dir, Name), File.Combine(Dir,Name))
End Sub

' This allows removing a BOM from strings from other sources. Counts the number of occurrences and if any found logs the count with the provided reference string.
Private Sub RemovePossibleBOM(s As String, sRefPath As String) As String
    Dim iCount As Int = 0
    Do While s.StartsWith(Chr(0xFEFF))
        s = s.SubString(1)
        iCount = iCount + 1
    Loop
    If iCount>0 Then Log($"removed FEFF BOM ${iCount} times for ${sRefPath}"$)
    Return s
End Sub
 

emexes

Expert
Licensed User
B4X:
' This allows removing a BOM from strings from other sources. Counts the number of occurrences and if any found logs the count with the provided reference string.
Private Sub RemovePossibleBOM(s As String, sRefPath As String) As String
    Dim iCount As Int = 0
    Do While s.StartsWith(Chr(0xFEFF))
        s = s.SubString(1)
        iCount = iCount + 1
    Loop
    If iCount>0 Then Log($"removed FEFF BOM ${iCount} times for ${sRefPath}"$)
    Return s
End Sub

How about a friendly I see your mallet and raise you one sledgehammer? :

B4X:
Private Sub RemovePossibleBOM(s As String, sRefPath As String) As String
    Dim SansBoms As String = s.Replace(Chr(0xFEFF), "")
    Dim iCount As Int = s.Length - SansBoms.Length
    If iCount > 0 Then Log($"removed FEFF BOM ${iCount} times for ${sRefPath}"$)
    Return SansBoms
End Sub

although I haven't actually tested it, so if you want to find any obvious mistakes and make me look like a drunken programmer, your odds are good. ?
 
Last edited:
Top