Android Question load data into memory

lelelor

Active Member
Licensed User
hi, I need to load the contents of a .txt file into vectors, I already have separate fields, I can't figure out how to load them into memory with vectors I can search on. Now I can work directly on the file but being large, the times are long. With VB6 and a pc you don't notice, with android unfortunately you ...
 

klaus

Expert
Licensed User
Longtime User
Sorry, but I do not understand your problem.
Could you please explain more in detail what exactly you want to achieve.
With vectors, do mean arrays of values or what else do you think of.
What is the content of the text file ?
And what do want to extract to what kind of variable or object ?
 
Upvote 0

lelelor

Active Member
Licensed User
sorry, since it is a big file, I would like to load the records of a file in matrices into memory, with vb6 I use the redim I tried to use with b4a, but it gives me an error.
place the code

Dim Reader As TextReader
If File.exists(File.DirRootExternal&"/chiocciola/","") = False Then File.MakeDir(File.DirRootExternal&"/chiocciola/","")
Reader.Initialize(File.openinput(File.DirRootExternal&"/chiocciola/", "anaiper.txt"))

Dim lstRighe As List = File.ReadList(File.DirRootExternal&"/chiocciola/", "anaiper.txt")
Dim numarticoli As Int

'numarticoli=lstRighe.size

Do While articoli <> Null
' Log(articoli)
articoli = Reader.ReadLine
Dim ArtCod(1 To numarticoli) As String
Dim ArtDes(1 To numarticoli) As String
Dim ArtPre(1 To numarticoli) As String

Dim ssplit() As String = Regex.Split (",",articoli)

numarticoli = numarticoli + 1

ArtCod(numarticoli)= ssplit(0)
ArtDes(numarticoli) =ssplit(1)
ArtPre(numarticoli)=ssplit(2)

Loop
Reader.Close
 
Upvote 0

lelelor

Active Member
Licensed User
the file contains thousands of records, items from a warehouse, with vb6 I can manage them well, I am new to B4A, I have not studied enough yet, you never study enough. excuse me
now I study the suggestions
 
Upvote 0

Albert Kallal

Active Member
Licensed User
the file contains thousands of records, items from a warehouse, with vb6 I can manage them well, I am new to B4A, I have not studied enough yet, you never study enough. excuse me
now I study the suggestions

Well, how big - that's relative? Also, did you try the code in "release" mode as opposed to debug mode? You can often "coax" the system to run fast in debug mode - just do a clean project - and then try it. if you edit code, then again, do a clean project, or as noted, try this code as release mode.

Take a quick read of this recent thread - we had full table scans against 1 million rows - and they were occurring in less then 2 seconds - that's faster then what I was seeing on my desktop!

https://www.b4x.com/android/forum/threads/import-or-read-db-with-1-000-000-and-more-records.128487/

So, you can't really assume desktop memory, processing in a handheld device. It really kind of the same difference between say a heavy transport truck and a small 2 seat airplane - they just are not the same things.

But, give your code a try in "release mode". You should in fact in most cases actually see near PC like speeds. Perhaps not what we say have on a really fast desktop, but you can get rather great performance here. And often it just means taking a somewhat different approach then the "very" generous memory and processing we have on desktops. So, while one has to treat the phone like a "light small airplane" as opposed to a heavy transport truck (your desktop)?

Often looking at the problem in a different way can result in a better solution. So in place of that long reading of data - perhaps you store the data in a database, and you could then use some forms of indexing against that data - much faster then some sequential type of array for example. This assumes then you can use indexed retrieval of items as opposed to say some large loop against a large array.

Of course it depends on what you really need to do with the data once your code "gets" its hands on such data. But take a gander of that above thread.

In fact, I am seeing faster speeds with SQLite on my phone then I am on my desktop!!!

It not clear if pulling the whole data set into memory makes sense, since with a phone, often the "old idea" of pulling everything into memory for more speed? That can often make things slower - since memory is a limited resource on the phone. So in desktop land, sure, we often use that trick of pulling data into memory for more speed - but that trick has limits, but often a different and better approach is warranted.

Hard to guess here - it really depends on when you say lots of rows. To some that means 10,000 rows, others that might mean 500,000 rows. Without question, the relative smaller the dataset, the more leeway you can take in terms of reading that data. So, just like in VB6, you can open a text file, and read line by line, or you can use read() and read the WHOLE file in one shot. (and then process the rows). So, which way makes more sense will "depend" - just like it does on the desktop.

Regards,
Albert D. Kallal
Edmonton, Alberta Canada
 
Upvote 0

lelelor

Active Member
Licensed User
Hello, below I place my code, used with files of 70 articles, I find the article again, with 155000 it stops and does not give the result. I know I'm wrong but I don't know where, can you help me? Thank you
</>
Sub bottone2_click

Dim articoli As String

Dim a As Int
Dim Reader As TextReader
If File.exists(File.DirRootExternal&"/chiocciola/","") = False Then File.MakeDir(File.DirRootExternal&"/chiocciola/","")
Reader.Initialize(File.openinput(File.DirRootExternal&"/chiocciola/", "anaiper.txt"))


Dim lstRighe As List = File.ReadList(File.DirRootExternal&"/chiocciola/", "anaiper.txt")
Dim numarticoli As Int
Dim ArtCod As String '(1 To numarticoli) As String
Dim pippo

numarticoli=lstRighe.Size
Log(numarticoli)

Do While articoli <> Null
articoli = Reader.ReadLine

Dim ssplit() As String = Regex.Split (",",articoli)

ArtCod= ssplit(0)

If ArtCod=t_codice.Text Then

l_codice.Text =ssplit(0)
l_descrizione.Text =ssplit(1)
l_prezzo.Text = ssplit(2)
Exit

End If

Loop
Reader.Close
End Sub
<\>
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
with 155000 it stops and does not give the result.
You were given all the necessary advices in post #4 and you did not follow a single one of them. By the way, for code tags, see post#4 again on how you need to do it to post code properly.
As @emexes suggests, your best bet is to at least post as a text file a small portion of the file that is giving you trouble to see how it is built.
 
Upvote 0

Peter Simpson

Expert
Licensed User
Longtime User
As Erel said, you should use code tags when posting your code...

1615901402702.png
 
Upvote 0

lelelor

Active Member
Licensed User
I apologize for the incorrect exposure of the code, I am new to this system and I have to learn, I create the .txt file as I want, so the "," are automatically removed. I don't want to search for code 155000, but that's 155,000 items. with 70 items the system works, with 10,000 no. I reiterate that I am new to this system, 15 days and not every day I can work on it, thank you all for the advice and for trying to help. place the file, .txt
thank you
 

Attachments

  • anaiper10000.txt
    327.3 KB · Views: 2,591
Upvote 0

emexes

Expert
Licensed User
It helps that the file is sorted - it means you can abandon the search as soon as you reach a product code greater than the code you're looking for.

There should be no problem with doing a linear search, other than it taking a while. If speed is an issue, then I'd just construct a file of 32-bit pointers to the lines of the file (only needs to be done once, on first access to a modified file) and then just do a binary search, = around 26 disk accesses per lookup for your 10000 record file, or 34 disk access per lookup for your 144000 record file (or half as many if you keep the line-start-pointer file in memory, given that 144000 records x 4 bytes/pointer = 576 kB ie < 1 MB)
 
Upvote 0

emexes

Expert
Licensed User
Lol here's a different solution, if you don't mind what other people think of your programming. :cool:

Load the entire file into a single string. Should be no problem. ?

Add a line terminator to the start of the string.

To search for product code 4029811405299, search the whole file/string for:

line terminator & "4029811405299" & ","

and then continue the search to find the line terminator at the end of that line (assuming the product code is in the file/string; perhaps it isn't).
 
Last edited:
Upvote 0

lelelor

Active Member
Licensed User
It helps that the file is sorted - it means you can abandon the search as soon as you reach a product code greater than the code you're looking for.

There should be no problem with doing a linear search, other than it taking a while. If speed is an issue, then I'd just construct a file of 32-bit pointers to the lines of the file (only needs to be done once, on first access to a modified file) and then just do a binary search, = around 26 disk accesses per lookup for your 10000 record file, or 34 disk access per lookup for your 144000 record file (or half as many if you keep the line-start-pointer file in memory, given that 144000 records x 4 bytes/pointer = 576 kB ie < 1 MB)

Thanks for the encouragement, I'm learning and that's okay, that's what I want to get to, the file is sorted, but I don't know how to create the pointer, how to say go to the middle of the file and if bigger go to the middle between this and the last, or smaller, you go halfway between this and the first ... I don't know I can be clear
 
Upvote 0

William Lancee

Well-Known Member
Licensed User
Longtime User
Caution with Regex.Split, it ignores trailing empty fields, leaving the number of fields per line short.

It is better to use StringUtils.LoadCSV or CSVParser, as @Erel recommended.
 
Upvote 0

Mahares

Expert
Licensed User
Longtime User
I'm learning and that's okay, that's what I want to get to
Maybe later when you feel comfortable with the forum, your file will be most suitable for SQLite database as Peter suggested because of the large number of records. However, if you want to use it as a text file, I recommend parsing it using StringUtils LoadCSV since you have several records with empty fields, it is easier to parse with StringUtils than with RegEx. Here is a snippet, but I can post an entire project if you so desire and are not overwhelmed. Even if there is no interest now, you get the point that there are sometimes more than one way:
You enter the number or text in the first column you are looking for in t_codice edittext and click the button. The other edittext boxes get filled automatically for you with the corresponding data:
B4X:
MyList  =su.LoadCSV(File.DirAssets, "anaiper10000.txt", ",")   'add the file to the files folder
B4X:
Private Sub Button1_Click
    l_codice.Text=""
    l_descrizione.Text=""
    l_prezzo.Text=""
    Dim t1 As Long =DateTime.Now
    For i=0 To MyList.Size-1
        Dim row() As String= MyList.Get(i)
        If t_codice.Text= row(0) Then
            l_codice.Text =row(0)
            l_descrizione.Text =row(1)
            l_prezzo.Text = row(2)
            Exit
        End If
    Next
    If i = MyList.Size Then
        Log("not found")
    Else
        Dim t2 As Long =DateTime.Now        
        Log($"Rec no:${i+1} ${CRLF} Took: ${t2-t1} millisec"$)
    End If
End Sub
 

Attachments

  • record.png
    record.png
    9.3 KB · Views: 157
Upvote 0

Albert Kallal

Active Member
Licensed User
Thanks for the encouragement, I'm learning and that's okay, that's what I want to get to, the file is sorted, but I don't know how to create the pointer, how to say go to the middle of the file and if bigger go to the middle between this and the last, or smaller, you go halfway between this and the first ... I don't know I can be clear

Ok, lets have some fun, shall we?
First, I can raw read + loop search the 10,000 rows 0.04 seconds. (brute force). Also here is the load times for your sample 10,000 row file:

test2r1.png

Now if you look close? The read time for 10,000 rows was 0.3 seconds (1/3). So, 100,000 rows? Probably 3 seconds. 400,000 rows? about 12 by my guess.
That is not too bad. As noted, if you doing some kind of bar code or look up? Then I would still consider a database. But lets keep this really simple.

Next up - search/scan against the 10,000 rows. Well, lets look for the last row (brute force). And we get this:

test2r.png


Now, that speed REALLY did surprise me. So, 10,000 rows - in a loop Time = 0.05. So, 100,000 rows should be about 0.5, and say 300,000 rows about 1.5.

Now, someone suggested a "pointer" search. Man, talk about putting candy in front of me? What I decded to do was to SORT the list. Now, I can use a old style binary chop search. They are EASY to write. If you look at the first image? Note how I converted the read list into somthing a bit more friendly to sorting. and better yet is you can now in your code use a "name" to reference the 3 columns. Again that sort speed IS STUNNING!!!

Now, with a binary chop search? The search time is LESS then 1000/th of a second!!! And it will be for 400,000 rows.
I simply could not, but could not refain from writing that chop search. Its really simple you have a sorted list/array. You check in teh middle, and then chop in half based on the search value. On 400,000 rows, this SIMPLE search code will run without any dealy.

I have attached a working zip project of above. What is really nice? Well, you were VERY smart and kind to provide a sample file. So, delete the smaple text file (its in this project! Delete it (use the files from B4A).

For those wondering about that chop search? Oh that is a classic computer algorything. One that I not had to write for more then 20 years. In fact I think the last time i did this was on a Apple II (in pascal!!!).

here is the search routine - it just assumes the list is in order, and given that assume? Then you just chop the list in half down the middle, left or right until you get your answer - it is STUPID fast on this 10,000 list.

The code (snip) for that search is this:
B4X:
Sub cmdByLine_Click
    
..
        
    Do While bolFound = False

        OneRow = MyList.Get(Half)
        
        If txtSearch.Text > OneRow.PNum Then
            If Half = MyList.Size - 1 Then
                bolFound = False
                Exit
            End If
            Lower = Half
            Half = (Half + Upper) /2
        Else If txtSearch.Text < OneRow.PNum Then
            If Half = 0 Then
                bolFound = False
                Exit
            End If
            Upper = Half
            Half = (Half + Lower)/2
        Else
            bolFound = True
            intFound = Half
            Exit
        End If
    Loop
    
End Sub
I did snip out a bit - the routine is in the attached sample.

(its been so long - I can't really remember the correct way to terminate! - but above seems to work just fine).
(just oh so much fun - I remember this algorithm from my computer sci book at university!!
However, my termination code I don't think is 100% - I think I still have that book! (algorithms)).

Now I have attached the B4A project here. You should be able to open in B4A, and it should just work. I avoided any libraries that are not part of B4A, and I attempted to write the code in VB/VBA style for your ease of learning.

Give the sample attached project try. And then from B4A, remove the 10,000 row sample, and bump it up to 100,000 rows. Try that. And then try the larger 400k table. This way, you can progress up to that final large table.

However, right now?

Performance with 10,000 rows, or 100,000 rows? Not an issue.

And me? Well, in place of opening up a Christmas list for shopping in notepad for 10 people?
I will fire up a database!!!

For me, every solution is a database, since that's the hammer I carry! I do think with a 400k list, then a database would be better, since the list does NOT have to be pre-read and then pre-loaded into memory, and that overall will be a better choice. but, lets push B4A - see how big your list can go before we damage the little reactor.

Anyway, do give the sample attached a try - it if works, then try a 100k text file in the project (remove the 10k one).

So our read times are less then 1 second. I would as noted, try 100k row.
If that speed is ok, then go with reading the file.

If not then we start to consider a database. But the above project "ready as is" for you to try should work. Have fun - I REALLY had a giggle writing this out!!

So you have options! - and the choices we have here I think CAN deal with this problem - it just not 100% clear which road is the least effort - but B4A as a tool is MOST certainly up to this task.

Regards,
Albert D. Kallal
Edmonton, Alberta Canada
 

Attachments

  • ReadText.zip
    101.8 KB · Views: 144
Upvote 0

emexes

Expert
Licensed User
Then you just chop the list in half down the middle, left or right until you get your answer - it is STUPID fast on this 10,000 list.
While you're on a roll, I've got two words for you: interpolation search. ?

(also one word - hashing - but given that the original data was sorted, plus that it might later be advantageous to be able to search using an incomplete product key, binary search seemed the better choice)
 
Upvote 0

Albert Kallal

Active Member
Licensed User
While you're on a roll, I've got two words for you: interpolation search. ?

(also one word - hashing - but given that the original data was sorted, plus that it might later be advantageous to be able to search using an incomplete product key, binary search seemed the better choice)

Funny you note that - I worked on a system for quite a few years - the base underlying technology was all hash coding for near everything!

And .net collections are hashed for performance. (but we developers don't worry about such things anymore!!).

Even in this case - I used a sort library that built into the string utilities. It was VERY fast (4 1/1000th of a second for 10,000 records!!!).

I was only able to use that cute chop search (binary search) because I had/could sort the data.

As noted, last time i did something like this was a long time ago in a galaxy far away.

I found this post really enjoyable and fun. It is REALLY rare these days one even has to worry about these things.

Today sorts, searching etc?
Almost all data aware objects and systems you use will have these things built in.

We just don't (have to) write such code anymore. Even way back then - the instant I used PC databases systems from FoxPro to Access to dbase etc? Never needed to write such a routine until now!! And even for this case in B4A, I would adopt sqlIte - and again I would not be writing such code!

But, it was oh so fun to see the routine chop out the search!
The debug log looks like this for the 10,000 rows:

1616037778289.png


Most list and objects we use today have find, have sorting (even B4A had a super fast sort).

My view of course is a database. This would eliminate the import times, import code and reduce memory requirements.
(and still run with better speed).

My excitement here was due to not having done this for years (about 30 years now!!!).
I recall doing this in Pascal on the Apple, and also for some code on a mainframe system (code still in use 30 years later).
So, yes, this post made me feel like a kid again! - we in general just don't get to write these simple little fun things anymore!

But watching my little phone go chop chop away at this list? Wow, that really is a computer in my hand!! - Gee, what a hoot!

R
Albert
 
Upvote 0

lelelor

Active Member
Licensed User
thanks Albert, since I program as a hobby, I didn't have a lot of time but I tried and tried to learn the structure of the routine, on the 10k articles it's ok, unfortunately on the 160k around it stops me. I know I'm wrong but I don't know where ... I would like to attach the 160k file but it's too big (900k). how can I do to send it? thank you very much
 
Upvote 0
Top