Android Question search in huge CSV file

Zeev Goldstein · May 25, 2025

hi

i have a huge csv file (~14 MB)
it has one column only
what is the fastest way to search for a value in the file?
there will be only one elements as all items are unique
i just need to know if the value i search is in the csv or not
no need to load anything

any idea?
a short sample will be appreciated

thanks

teddybear · May 25, 2025

How many rows are there in the csv file? what about size is every row?

DonManfred · May 25, 2025

Probably better to parse the CSV and put it into a Database. Searching a database is probably much faster then going over all lines in the csv.

Brian Dean · May 25, 2025

It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.

Zeev Goldstein · May 25, 2025

teddybear said:
How many rows are there in the csv file? what about size is every row?

approx 600K lines
each line holds 8 digits

Zeev Goldstein · May 25, 2025

DonManfred said:
Probably better to parse the CSV and put it into a Database. Searching a database is probably much faster then going over all lines in the csv.

thought so
but importing the CSV into DB might take time
the CSV might change every 24 or 48 hours and then i will need to download it and load it again
so is this really the most efficient and fastest way?

Zeev Goldstein · May 25, 2025

Brian Dean said:
It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.

i can't predict the search times per day
it may be a few times one day and then none for another day

each time or each first run per a day i think i will need to download the csv as it will update

John Naylor · May 25, 2025

The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)

I don't know how quickly .containskey() would work but it's an easy test

Zeev Goldstein · May 26, 2025

here's an update on my post

seems the file is ~14GB
the content is NOT just 8 digits per line - it is 20
each line is a string with 3 elements separated by | so it ends up with 2 fields within one string
the rest is as said

so how so i find one element (the first 8 digits of the line) in the fastest & most efficient way?

a sample will be highly appreciated

thanks

Zeev Goldstein · May 26, 2025

John Naylor said:
The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)

I don't know how quickly .containskey() would work but it's an easy test

thanks
will it be possible for you to share a sample to test it?
thanx

DonManfred · May 26, 2025

The most effizient way would be if YOU post an sample of this file and the expected result.

- NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!
- A Databasesearch is REALLY FAST so to know if it is worth to put them in a Database is Mandatory for an good answer. Is the CSV static then it is worth 100%
If it changes then YOU have to decide if it is worth to prse the CSV and split up each line in a Datbasseentry.

Zeev Goldstein · May 26, 2025

DonManfred said:
The most effizient way would be if YOU post an sample of this file and the expected result.
NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!

ok

here is the sample csv
it contains ~600000 lines similar to the one in it
each is unique

Sandman · May 26, 2025

When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.

Zeev Goldstein · May 26, 2025

Sandman said:
When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.

did you want to see all 600K lines? ~14GB?
why rude?
all lines are the same in content and size
i really don't see the point of uploading this entire junk
this is the sample data i have in the csv
and by the way - the entire data has some confidential issues...

Sandman · May 26, 2025

I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.

But now that you say that the numbers are confidential I can see why it's not possible to share them.

peacemaker · May 26, 2025

1) Calculate the chunk size from the file size and the free memory
2) To read the chunk by the InputStream https://www.b4x.com/android/forum/t...-encryption-and-decryption.114141/post-713166
3) search inside the read buffer
4) Go on

DonManfred · May 26, 2025

Zeev Goldstein said:
it contains ~600000 lines similar to the one in it
each is unique

It is a good start. What have you tried by yourself? Do not expect others do the coding for you.

Zeev Goldstein · May 26, 2025

Sandman said:
I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.

But now that you say that the numbers are confidential I can see why it's not possible to share them.

thanx
the data indeed is problematic
the line in the file is real data, the header not
but the entire file is as i said

Zeev Goldstein · May 26, 2025

DonManfred said:
It is a good start. What have you tried by yourself? Do not expect others do the coding for you.

no, of course not expecting others to do the work for me
i'm currently in research mode due to the file size
on normal file size i have a solution - i just load it into table (grid)
but that monster is too big so i'm looking for ideas how to handle it
looping line by line seems a bit madness

DonManfred · May 26, 2025

Zeev Goldstein said:
looping line by line seems a bit madness

Looping line by line is NOT a slow method.
As written multiple times it depends on
- how often will the csv will be searched?
- how often is the CSV changed (to decide wheter it is worth to parse the csv and write it to a database)? Again: using a databasesearch is a really fast task. No matter if the Database contains 100 or 100 million datasets.

Android Question search in huge CSV file

Well-Known Member

Well-Known Member

Expert

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Expert

Well-Known Member

Attachments

Expert

Well-Known Member

Expert

Expert

Expert

Well-Known Member

Well-Known Member

Expert

Similar Threads

Privacy & Transparency

Privacy & Transparency