Android Question Number of lines in a text file

Sergey_New

Well-Known Member
Licensed User
Longtime User
I need to read a text file.
For the progressbar to work, I need to know the number of lines scount.
B4X:
    Try
        Dim rd As TextReader:
        rd.Initialize(File.OpenInput(Starter.myFolder, FileName))
        scount=Regex.Split(CRLF,rd.ReadAll).Length
        rd.Close
    Catch
        Log(LastException)
    End Try
If the number of lines is less than 90,000, then everything works, but if, for example, there are 1,500,000 lines, the application crashes on line #4.

What restrictions might there be?
 

Sergey_New

Well-Known Member
Licensed User
Longtime User
Any format?
No, the file has a strict format. Look at the file text. Each line contains two or three groups of characters.
These strings need to be converted into a special database.
 
Upvote 0

emexes

Expert
Licensed User
Needed in B4A

If the "read whole file at once" approach is too much for Android, then perhaps try reading it using .ReadLine and estimating the progress based on the file size and the cumulative length of strings read up to now (including 2 byte line terminators).
 
Last edited:
Upvote 0

akshita8

New Member
I need to read a text file.
For the progressbar to work, I need to know the number of lines scount.
B4X:
Try
Dim rd As TextReader:
rd.Initialize(File.OpenInput(Starter.myFolder, FileName))
scount=Regex.Split(CRLF,rd.ReadAll).Length
rd.Close
Catch
Log(LastException)
End Try
If the number of lines is less than 90,000, then everything works, but if, for example, there are 1,500,000 lines, the application crashes on line #4.

What restrictions might there be?

B4X:
Dim rd As TextReader
Try
    rd.Initialize(File.OpenInput(Starter.myFolder, FileName))
    Do While True
        Dim line As String = rd.ReadLine
        If line = Null Then Exit
        ' Process the line or update progress here
    Loop
    rd.Close
Catch
    Log(LastException)
End Try
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
reading it using .ReadLine
I've already tried disabling the progress bar. I wrote above that you can determine the number of lines. But reading the file is not completed by either ReadLine or ReadList
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
The file is large, you can download it from the Link.
The number of lines in this file is determined without problems:
B4X:
    Dim lst As List
    lst.Initialize
    lst=File.ReadList(Starter.myFolder, FileName)
    Log(lst.Size)
And reading lines and entering them, for example, into a List, causes memory overflow.
I have downloaded the file got the number of lines fine and fast.
Made a small alteration to my posted code to make it produce the same number of lines as the code with TextReader as posted by Daestrum:

B4X:
'btEndOfLineByte will usually be 10
Sub GetTextFileLineCount(strFolder As String, strFile As String, btEndOfLineByte As Byte) As Int

    Dim i As Int
    Dim iBytes As Int
    Dim lPosition As Long
    Dim iLines As Int
    
    RAF.Initialize(strFolder, strFile, True)
    
    iBytes = 100000 'could make smaller or larger

    Do While lPosition < RAF.Size
        
        Dim arrBytes(iBytes) As Byte
        iBytes = RAF.ReadBytes(arrBytes, 0, iBytes, lPosition)
        
        For i = 0 To iBytes - 1
            If arrBytes(i) = btEndOfLineByte Then
                iLines = iLines + 1
            End If
        Next
        
        lPosition = lPosition + iBytes
    
    Loop
    
    'Log("Byte " & iBytes & ": " & arrBytes(iBytes - 1))
    
    RAF.Close
    
    If arrBytes(iBytes - 1) = btEndOfLineByte Then
        Return iLines
    Else
        Return iLines + 1
    End If
    
End Sub

This takes about 120 milli-secs on a Samsung S23 and gives me 1826551 lines.
Using the code with TextReader it takes about 300 milli-secs and gives the same number of lines.
No memory problems noted with either method.
So, all in all I can see no problem getting the line count and use that for your progress monitor.

RBS
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
If the number of lines in the file is 1826562, your code gives the value results(1)=6852389.
Why is that?
sorry, the output from wc is lines, words and bytes. so:
MsgboxAsync( results(0) & " lines, " & results(1) & " words, " & results(2) & " bytes", "FYI:")
 
Upvote 0

drgottjr

Expert
Licensed User
Longtime User
technically, you can't. the system tells you when you've exceeded available memory. it is possible to query available memory and to compare that to the size of the file and decide what to do. but you still might crash. you can only use try/catch, and even then, i'm not sure it can be caught. it's an error, not an exception. give it a try. go back to your old "readAll" to provoke a crash. but wrap it in a try/catch, like i did. if you end up in the catch portion, then it works, and you have a graceful way out. since wc gives you the file size in advance, that can help. anyting > 2gb is likely to crash. but available memory varies depending on garbage collection and other factors. i suggested wc because it may handle things differently than java's runtime environment. on systems not running in some "environment", the situation would be different. my guess was that wc is running outside of java's virtual machine. i didn't use a giant file to test. you've got one, so test it.
 
Upvote 0

emexes

Expert
Licensed User
But how can you determine if the file size exceeds the device's capabilities?

Is your reading of data from the .ged file solved, for large files where the data is too much to hold in memory all at once?

Is there a known largest .ged file size that you must be able to handle? Can you set an upper limit eg 2 GB (max Int) or 4 GB (max FAT32 file size)?

I can think of ways to effectively handle more records in memory, but they all have limits that are (presumably) smaller than if files were used instead of memory.

Also, do you need *all* of the .ged fields, or maybe eg: just names and birthdates, eg: to display a selection listview?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
You get an OOM Exception

You are better of with this kind of textfiles you are using and the amount of data inside if you create a class to parse the text file and write the results to a database. don´t care about a scrollbar, the exact count of lines. It isn´t worth the effort to hold them ALL in memory. TOO. Especially with such big files... no user want to see a list with millions of data

- Parse the file line by line, you know when a new dataset begins; write the last one to database and go ahead with parsing. You can regularly update a info of how many datasets you have finished so far.....
 
Upvote 0

Sergey_New

Well-Known Member
Licensed User
Longtime User
there is no such thing.
Here's my code:
B4X:
Sub ReadFile(FileName As String)
    Dim lst As List
    lst.Initialize
    Try
        Dim rd As TextReader
        rd.Initialize(File.OpenInput(Starter.myFolder, FileName))
        Dim strLines As String= rd.ReadLine
        Do While strLines <> Null
            lst.Add(strLines)
            strLines = rd.ReadLine
        Loop
    Catch
        Log(LastException)
    End Try
End Sub
There are no messages.
Please check with the file to which I provided a link above.
 
Upvote 0

Daestrum

Expert
Licensed User
Longtime User
There seems to be information missing here, a 38MB file shouldn't cause a memory problem.

Are you making multiple copies of the data?
In your code in post #36 you don't close the Textreader after you have read the file, this will consume memory. (This could just be an oversight where you posted the code though)
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Have you checked that the code works?
yes

B4X:
Private Sub B4XPage_Created (Root1 As B4XView)
    Root = Root1
    Root.LoadLayout("MainPage")
    Wait For (File.CopyAsync(File.DirAssets, "royal.ged", File.DirInternal, "royal.ged")) Complete (Success As Boolean)
    Log("Copy Success: " & Success)
    Dim l As List
    l.Initialize
    Dim rd As TextReader
    'Try
    rd.Initialize(File.OpenInput(File.DirInternal, "royal.ged"))
    Do While True
        Dim line As String = rd.ReadLine
        If line = Null Then Exit
        l.Add(line)
        ' Process the line or update progress here
    Loop
    rd.Close
    Log($"File read into list with ${l.Size} lines..."$)
End Sub

B4X:
Logger verbunden mit:  samsung SM-G973F
--------- beginning of main
*** Service (starter) Create ***
** Service (starter) Start **
** Activity (main) Create (first time) **
Call B4XPages.GetManager.LogEvents = True to enable logging B4XPages events.
** Activity (main) Resume **
Copy Success: true
File read into list with 1826551 lines...

textfilelines.png


Same amount of lines shown in Notepad++...
 
Last edited:
Upvote 0

klaus

Expert
Licensed User
Longtime User
I tested the code from post #36 and the file from post #14 on my Samsung Galaxy S10.
I copied the file into File.DirAssest for the test.
And the code from post #36 works OK in Release mode, but not in Debug mode.

The test code:

B4X:
Sub ReadFile
    Dim lst As List
    lst.Initialize
    Try
        Log("Begin")
        Dim rd As TextReader
        rd.Initialize(File.OpenInput(FolderName, FileName))
        Dim strLines As String= rd.ReadLine
        Do While strLines <> Null
            lst.Add(strLines)
            strLines = rd.ReadLine
        Loop
        Log(lst.Size)
        Log(lst.Get(0))
        Log(lst.Get(lst.Size - 1))
    Catch
        Log(LastException)
    End Try
End Sub

The logs in Release mode, the number of lines is OK and the content of the last line is OK.:
*** Service (starter) Create ***
** Service (starter) Start **
** Activity (main) Create (first time) **
Begin
1826551
0 HEAD
0 TRLR
** Activity (main) Resume **

And the logs in Debug mode.
** Activity (main) Pause, UserClosed = false **
** Activity (main) Resume **
*** Service (starter) Create ***
** Service (starter) Start **
** Activity (main) Create (first time) **
Begin
(ErrnoException) android.system.ErrnoException: open failed: ENOENT (No such file or directory)
** Activity (main) Resume **

I suppose that for the Debugger the file is too big.
 
Upvote 0
Top