Android Question search in huge CSV file

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
hi

i have a huge csv file (~14 MB)
it has one column only
what is the fastest way to search for a value in the file?
there will be only one elements as all items are unique
i just need to know if the value i search is in the csv or not
no need to load anything

any idea?
a short sample will be appreciated

thanks
 

teddybear

Well-Known Member
Licensed User
How many rows are there in the csv file? what about size is every row?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Probably better to parse the CSV and put it into a Database. Searching a database is probably much faster then going over all lines in the csv.
 
Upvote 0

Brian Dean

Well-Known Member
Licensed User
Longtime User
It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
Probably better to parse the CSV and put it into a Database. Searching a database is probably much faster then going over all lines in the csv.
thought so
but importing the CSV into DB might take time
the CSV might change every 24 or 48 hours and then i will need to download it and load it again
so is this really the most efficient and fastest way?
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.
i can't predict the search times per day
it may be a few times one day and then none for another day

each time or each first run per a day i think i will need to download the csv as it will update
 
Upvote 0

John Naylor

Active Member
Licensed User
Longtime User
The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)

I don't know how quickly .containskey() would work but it's an easy test
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
here's an update on my post

seems the file is ~14GB
the content is NOT just 8 digits per line - it is 20
each line is a string with 3 elements separated by | so it ends up with 2 fields within one string
the rest is as said

so how so i find one element (the first 8 digits of the line) in the fastest & most efficient way?

a sample will be highly appreciated

thanks
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)

I don't know how quickly .containskey() would work but it's an easy test
thanks
will it be possible for you to share a sample to test it?
thanx
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
The most effizient way would be if YOU post an sample of this file and the expected result.

- NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!
- A Databasesearch is REALLY FAST so to know if it is worth to put them in a Database is Mandatory for an good answer. Is the CSV static then it is worth 100%
If it changes then YOU have to decide if it is worth to prse the CSV and split up each line in a Datbasseentry.
 
Last edited:
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
The most effizient way would be if YOU post an sample of this file and the expected result.
NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!
ok

here is the sample csv
it contains ~600000 lines similar to the one in it
each is unique
 

Attachments

  • SampleCSV.zip
    236 bytes · Views: 92
Upvote 0

Sandman

Expert
Licensed User
Longtime User
When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.
did you want to see all 600K lines? ~14GB?
why rude?
all lines are the same in content and size
i really don't see the point of uploading this entire junk
this is the sample data i have in the csv
and by the way - the entire data has some confidential issues...
 
Upvote 0

Sandman

Expert
Licensed User
Longtime User
I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.

But now that you say that the numbers are confidential I can see why it's not possible to share them.
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.

But now that you say that the numbers are confidential I can see why it's not possible to share them.
thanx
the data indeed is problematic
the line in the file is real data, the header not
but the entire file is as i said
 
Upvote 0

Zeev Goldstein

Well-Known Member
Licensed User
Longtime User
It is a good start. What have you tried by yourself? Do not expect others do the coding for you.
no, of course not expecting others to do the work for me
i'm currently in research mode due to the file size
on normal file size i have a solution - i just load it into table (grid)
but that monster is too big so i'm looking for ideas how to handle it
looping line by line seems a bit madness
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
looping line by line seems a bit madness
Looping line by line is NOT a slow method.
As written multiple times it depends on
- how often will the csv will be searched?
- how often is the CSV changed (to decide wheter it is worth to parse the csv and write it to a database)? Again: using a databasesearch is a really fast task. No matter if the Database contains 100 or 100 million datasets.
 
Upvote 0
Top