B4J Question Split text file after end of line (EOL)

David Meier

Active Member
Licensed User
Longtime User
I have a b4j application that has to split a text after each line. Now the input files have different control characters at the line end:

a) 0x0D 0x0A (char(10) and char(13))
b) only 0x0A (char(10)

Now I do something like this:
B4X:
array = Regex.Split(CRLF,myText)

Unfortunately this leads to wrong results. Because files with the 0x0D 0x0A line endings keep the 0x0D.
Does anybody know of a good solution to this?

Regards David
 

David Meier

Active Member
Licensed User
Longtime User
If the source of myText is a file then File.ReadList accepts different EOL characters.

B4X:
    Dim fileLines as List = File.ReadList(sourceDir, "filename.txt")
Thanks
In my case the text is coming from a mobile app and is passed through server socket and async streams. Originally the text comes from a QR Code that is being scanned and then passed through to a desktop app. Apperently the text behind the QR codes can have different line feeds.
But thanks for your solution.
 
Upvote 0

emexes

Expert
Licensed User
This post twigged memory of another way of handling the CR/LF/CRLF morass... try:
B4X:
myText.Replace(Chr(13) & Chr(10), Chr(13))    'convert CR+LF line endings to be just CR
myText.Replace(Chr(10), Chr(13))    'convert LF line endings to be CR

'at this point all line endings should now just be CR aka Chr(13) aka "\r"

array = Regex.Split("\r", myText)
 
Upvote 0

David Meier

Active Member
Licensed User
Longtime User
This post twigged memory of another way of handling the CR/LF/CRLF morass... try:
B4X:
myText.Replace(Chr(13) & Chr(10), Chr(13))    'convert CR+LF line endings to be just CR
myText.Replace(Chr(10), Chr(13))    'convert LF line endings to be CR

'at this point all line endings should now just be CR aka Chr(13) aka "\r"

array = Regex.Split("\r",myText)
I was just about to answer to your solutions. And I had exactly this replace solution on my mind, inspired by your first post. I think, this is the way to do it.
Thx a lot my friend!
David
 
Upvote 0

David Meier

Active Member
Licensed User
Longtime User
You were quick, very quick and I thank you for that! And yes, splitting worked for the first suggestion.
Except that an unnecessary special character came on the next line when the combination 0D 0A was present. BTW I tried the replace solution and it seems to work fine. I need to do some testing though.
What I think: It is most probable to find only texts with either 0A or 0D 0A as line ends (see wikipedia). So I implemented:
B4X:
myText = myText.Replace(Chr(13), "")
Where Chr(13) is eq to 0D

Thx again and have a nice Sunday
David
 
Upvote 0

emexes

Expert
Licensed User
Except that an unnecessary special character came on the next line when the combination 0D 0A was present.
Lol. That'd be because Einstein here thought "\n" but typed "\c". Fixed up, this should work:
B4X:
array = Regex.Split("(?:\r\n?|\r?\n)", myText)   'might even work without the "?:"
and if you want it to do the equivalent of your handling 0A or 0D 0A but not 0D alone, then this should work:
B4X:
array = Regex.Split("(?:\n\r?)", myText)   'might even work without the "?:"

1616329086690.png


edit: I understand you'll probably stick with the .Replace, but I've fixed my stuff-up just in case anyone else reading this in the future likes the regex-does-it-all method

edit: and this'd be even simpler (if it works):
B4X:
array = Regex.Split("(?:\r\n|\n|\r)", myText)   'might even work without the "?:" and/or the "("..")"
 
Last edited:
Upvote 0

David Meier

Active Member
Licensed User
Longtime User
Lol. That'd be because Einstein here thought "\n" but typed "\c". Fixed up, this should work:
B4X:
array = Regex.Split("(?:\r\n?|\r?\n)", myText)   'might even work without the "?:"
and if you want it to do the equivalent of your handling 0A or 0D 0A but not 0D alone, then this should work:
B4X:
array = Regex.Split("(?:\n\r?)", myText)   'might even work without the "?:"

View attachment 110158

edit: I understand you'll probably stick with the .Replace, but I've fixed my stuff-up just in case anyone else reading this in the future likes the regex-does-it-all method

edit: and this'd be even simpler (if it works):
B4X:
array = Regex.Split("(?:\r\n|\n|\r)", myText)   'might even work without the "?:" and/or the "("..")"

Wow, incredible. This is a good brush up for my rusty regex knowledge. I will keep this "sunday lesson" in mind ? I really appreciate your help ????
 
Upvote 0
Top