i have a huge csv file (~14 MB)
it has one column only
what is the fastest way to search for a value in the file?
there will be only one elements as all items are unique
i just need to know if the value i search is in the csv or not
no need to load anything
It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.
thought so
but importing the CSV into DB might take time
the CSV might change every 24 or 48 hours and then i will need to download it and load it again
so is this really the most efficient and fastest way?
It depends on how frequently you plan to do a search. If you will be doing it many times then sort the list - after that a simple binary search might be quick enough, with much lower overhead than a database. Depends also on your resource priorities.
The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)
I don't know how quickly .containskey() would work but it's an easy test
seems the file is ~14GB
the content is NOT just 8 digits per line - it is 20
each line is a string with 3 elements separated by | so it ends up with 2 fields within one string
the rest is as said
so how so i find one element (the first 8 digits of the line) in the fastest & most efficient way?
The file will be around 6Mb which will only take a few seconds on a fairly modern Android device to parse it line by line and store it in a map. Each number becomes a key. With the map structure overhead you're probably looking at about 25Mb in memory so again plenty of room on most devices. Then check for the existence of the value with data.containskey (...)
I don't know how quickly .containskey() would work but it's an easy test
The most effizient way would be if YOU post an sample of this file and the expected result.
- NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!
- A Databasesearch is REALLY FAST so to know if it is worth to put them in a Database is Mandatory for an good answer. Is the CSV static then it is worth 100%
If it changes then YOU have to decide if it is worth to prse the CSV and split up each line in a Datbasseentry.
The most effizient way would be if YOU post an sample of this file and the expected result.
NO ONE HERE does know what you want. And posting only a minimum of information is a really uneffizient way to ask for help!
When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.
When you post a sample csv with a header and one sample row, it's like you try to make it as difficult as possible to help you. One could argue that it borders on being rude.
did you want to see all 600K lines? ~14GB?
why rude?
all lines are the same in content and size
i really don't see the point of uploading this entire junk
this is the sample data i have in the csv
and by the way - the entire data has some confidential issues...
I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.
But now that you say that the numbers are confidential I can see why it's not possible to share them.
I would have expected at least a couple of hundred lines for a sample. Posting what you did only explains the formatting, as if we were too stupid to understand it from your description.
But now that you say that the numbers are confidential I can see why it's not possible to share them.
no, of course not expecting others to do the work for me
i'm currently in research mode due to the file size
on normal file size i have a solution - i just load it into table (grid)
but that monster is too big so i'm looking for ideas how to handle it
looping line by line seems a bit madness
Looping line by line is NOT a slow method.
As written multiple times it depends on
- how often will the csv will be searched?
- how often is the CSV changed (to decide wheter it is worth to parse the csv and write it to a database)? Again: using a databasesearch is a really fast task. No matter if the Database contains 100 or 100 million datasets.