B4J Question Split text file after end of line (EOL)

David Meier · Mar 20, 2021

I have a b4j application that has to split a text after each line. Now the input files have different control characters at the line end:

a) 0x0D 0x0A (char(10) and char(13))
b) only 0x0A (char(10)

Now I do something like this:

B4X:

array = Regex.Split(CRLF,myText)

Unfortunately this leads to wrong results. Because files with the 0x0D 0x0A line endings keep the 0x0D.
Does anybody know of a good solution to this?

Regards David

emexes · Mar 20, 2021

Give this untested thought a burl:

B4X:

array = Regex.Split("(\c\r?|\c?\r)", myText)

William Lancee · Mar 20, 2021

If the source of myText is a file then File.ReadList accepts different EOL characters.

B4X:

    Dim fileLines as List = File.ReadList(sourceDir, "filename.txt")

David Meier · Mar 21, 2021

William Lancee said:
If the source of myText is a file then File.ReadList accepts different EOL characters.

B4X:

Dim fileLines as List = File.ReadList(sourceDir, "filename.txt")

Thanks
In my case the text is coming from a mobile app and is passed through server socket and async streams. Originally the text comes from a QR Code that is being scanned and then passed through to a desktop app. Apperently the text behind the QR codes can have different line feeds.
But thanks for your solution.

emexes · Mar 21, 2021

emexes said:
B4X:

array = Regex.Split("(\c\r?|\c?\r)", myText)

Any luck with this? If not, try:

B4X:

array = Regex.Split("(?:\c\r?|\c?\r)", myText)

emexes · Mar 21, 2021

This post twigged memory of another way of handling the CR/LF/CRLF morass... try:

B4X:

myText.Replace(Chr(13) & Chr(10), Chr(13))    'convert CR+LF line endings to be just CR
myText.Replace(Chr(10), Chr(13))    'convert LF line endings to be CR

'at this point all line endings should now just be CR aka Chr(13) aka "\r"

array = Regex.Split("\r", myText)

David Meier · Mar 21, 2021

emexes said:

This post twigged memory of another way of handling the CR/LF/CRLF morass... try:

B4X:

myText.Replace(Chr(13) & Chr(10), Chr(13))    'convert CR+LF line endings to be just CR
myText.Replace(Chr(10), Chr(13))    'convert LF line endings to be CR

'at this point all line endings should now just be CR aka Chr(13) aka "\r"

array = Regex.Split("\r",myText)

I was just about to answer to your solutions. And I had exactly this replace solution on my mind, inspired by your first post. I think, this is the way to do it.
Thx a lot my friend!
David

emexes · Mar 21, 2021

David Meier said:
I had exactly this replace solution on my mind

Great minds think alike, mine's just a bit slower. ?

Did the earlier regex split expression(s) work too?

David Meier · Mar 21, 2021

You were quick, very quick and I thank you for that! And yes, splitting worked for the first suggestion.
Except that an unnecessary special character came on the next line when the combination 0D 0A was present. BTW I tried the replace solution and it seems to work fine. I need to do some testing though.
What I think: It is most probable to find only texts with either 0A or 0D 0A as line ends (see wikipedia). So I implemented:

B4X:

myText = myText.Replace(Chr(13), "")

Where Chr(13) is eq to 0D

Thx again and have a nice Sunday
David

emexes · Mar 21, 2021

David Meier said:
Except that an unnecessary special character came on the next line when the combination 0D 0A was present.

Lol. That'd be because Einstein here thought "\n" but typed "\c". Fixed up, this should work:

B4X:

array = Regex.Split("(?:\r\n?|\r?\n)", myText)   'might even work without the "?:"

and if you want it to do the equivalent of your handling 0A or 0D 0A but not 0D alone, then this should work:

B4X:

array = Regex.Split("(?:\n\r?)", myText)   'might even work without the "?:"

edit: I understand you'll probably stick with the .Replace, but I've fixed my stuff-up just in case anyone else reading this in the future likes the regex-does-it-all method

edit: and this'd be even simpler (if it works):

B4X:

array = Regex.Split("(?:\r\n|\n|\r)", myText)   'might even work without the "?:" and/or the "("..")"

David Meier · Mar 21, 2021

emexes said:
Lol. That'd be because Einstein here thought "\n" but typed "\c". Fixed up, this should work:

B4X:

array = Regex.Split("(?:\r\n?|\r?\n)", myText) 'might even work without the "?:"

and if you want it to do the equivalent of your handling 0A or 0D 0A but not 0D alone, then this should work:

B4X:

array = Regex.Split("(?:\n\r?)", myText) 'might even work without the "?:"

View attachment 110158

edit: I understand you'll probably stick with the .Replace, but I've fixed my stuff-up just in case anyone else reading this in the future likes the regex-does-it-all method

edit: and this'd be even simpler (if it works):

B4X:

array = Regex.Split("(?:\r\n|\n|\r)", myText) 'might even work without the "?:" and/or the "("..")"

Wow, incredible. This is a good brush up for my rusty regex knowledge. I will keep this "sunday lesson" in mind ? I really appreciate your help ????

B4J Question Split text file after end of line (EOL)

David Meier

Active Member

emexes

Expert

William Lancee

Well-Known Member

David Meier

Active Member

emexes

Expert

emexes

Expert

David Meier

Active Member

emexes

Expert

David Meier

Active Member

emexes

Expert

David Meier

Active Member

Similar Threads