Android Question LoadCSV delimiter - bug?

TyneBridges

Member
Licensed User
Longtime User
I use LoadCSV to load my data into a SQLite database. My application is a diary and has a long text field containing characters such as single and double quotes and slashes. I've set | (the vertical bar character) as my delimiter because it doesn't occur in the data.
B4X:
Table = SU.LoadCSV(File.DirRootExternal, G.TFName, "|")

I'm finding that, if my text field entries start with a double quote (the word "Lockdown"), the quotation mark is taken as a delimiter and the database field is truncated after that word. This hasn't been a problem with quote marks further on in the text - the text is correctly read and ends at the | character - but is at the start. Is this expected behaviour? I can't think of a way round it other than not using that character in my file, which is problematic for me.

I'm still using B4A 5.02 and have avoided upgrading in case this brings new problems, but obviously will if this issue has been fixed. Any suggestions would be gratefully received.

Thanks.
 

emexes

Expert
Licensed User
This almost does the job, except that it falls over on two of the lines. I'm having a closer look at the input file now.
B4X:
Dim WholeFile As String = File.ReadString("c:\b4j", "SQLiteExport.txt")

WholeFile = WholeFile.Replace(Chr(10) & Chr(10), Chr(13))

Dim Lines() As String = Regex.Split("[\n]", WholeFile)

For I = 0 To Lines.Length - 1
    Log(I & " = " & Lines(I))
    
    Dim Fields() As String = Regex.Split("\|", Lines(I))
    For J = 0 To Fields.Length - 1
        Log("  " & J & " = [" & Fields(J).Replace(Chr(13), Chr(10)) & "]")
    Next
    
    'add Fields() to database here
Next
 
Upvote 0

TyneBridges

Member
Licensed User
Longtime User
Thanks for your help, Emexes. I tried your earlier code but couldn't get the output to load into my MySQL database - I think the problem was that the original version used a list and yours didn't. It was giving me a value for Lines.length of 1 although it seemed to be reading several records.
 
Upvote 0

emexes

Expert
Licensed User
This assembles the data to a list of string arrays, same as your original Table = SU.LoadCSV(File.DirRootExternal, G.TFName, "|")

In fact, we should just be writing a matching version of LoadCSV but call it LoadCsvWithoutBonusQuoteProcessing(...) ?
B4X:
Dim WholeFile As String = File.ReadString("c:\b4j", "SQLiteExport.txt")

WholeFile = WholeFile.Replace(Chr(10) & Chr(10), Chr(13))

Dim Lines() As String = Regex.Split("[\n]", WholeFile)

Dim ListOfStringArrays As List
ListOfStringArrays.Initialize

For I = 0 To Lines.Length - 1
    Log(I & " = " & Lines(I))
    Dim ArrayOfValues() As String = Regex.Split("\|", Lines(I))
    ListOfStringArrays.Add( ArrayOfValues )
Next
    
'ListOfStringArrays should be ready to add to database
'EXCEPT for those two pesky exceptions to the double-LF escaping

For Each SA() As String In ListOfStringArrays
    Log("*** START OF RECORD ***")
    For Each S As String In SA
        Log(S)
    Next
    Log("*** END OF RECORD ***")
Next
 
Upvote 0

emexes

Expert
Licensed User
In the long text fields of your test data, have users actually typed in two line-breaks so as to leave a blank line between paragraphs?

It does not seem logical that the export process would only double up line-breaks within a value, but not at the end of a value.
 
Upvote 0

emexes

Expert
Licensed User
Righto, this seems to work. But only for lines containing 10 values ie 9 "|" vertical bars
B4X:
Dim WholeFile As String = Chr(10) & File.ReadString("c:\b4j", "SQLiteExport.txt")

'bit of a dog's breakfast, thanks to unquoted LFs in data indistinguishable from end-of-line LFs
Dim SearchPattern As String = "(\d\d\d\d\-[^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)\n"
'four digit year, dash, nine bars, LF

Dim ListOfStringArrays As List
ListOfStringArrays.Initialize

Dim m As Matcher = Regex.Matcher(SearchPattern, WholeFile)
Do While m.Find
    Dim SA(m.GroupCount)
    For I = 1 To m.GroupCount
        SA(I - 1) = m.Group(I)
    Next
    ListOfStringArrays.Add(SA)
Loop
    
For Each SA() As String In ListOfStringArrays
    Log("*** START OF RECORD ***")
    For Each S As String In SA
        Log("[" & S & "]")
    Next
    Log("*** END OF RECORD ***")
Next
 
Upvote 0

emexes

Expert
Licensed User
In summary, what turned this from a seemingly-easy job to a PITA job is that there is no reliable way of telling whether a LF is part of a value or marks the end of a line, unless: you know how many values are on a line, and the last value does not contain a LF.
 
Upvote 0

TyneBridges

Member
Licensed User
Longtime User
Thanks again, Emexes. Everything is working now. I'm grateful for your time and expertise.

Initially my RETURN characters had all gone in the text after import into the app. I'd forgotten my original workaround for the problem, which was to replace all returns in the main text field with a random string that never occurred in the text ('XzQ'). I reinstated this into the export instruction and that fixed the issue. After importing, the Android app again replaces all instances of that string with actual returns.
 
Upvote 0

emexes

Expert
Licensed User
expertise, lol

nevergiveup.jpg
 
Upvote 0
Top