B4J Question huge files quantity

peacemaker

Expert
Licensed User
Longtime User
HI, All

As i can understand, if a system during work generates huge qty of files - it's very bad situation for absolutely any operating system ?
Due to if you need to work with files - you have to list them all first to iterate further, and for huge qty - you'll just hang your app up waiting for listing result...

Correct ?

If yes - maybe someone already tried to solve this situation and made some code functions to store\get files from a huge files storage ?
 

DonManfred

Expert
Licensed User
Longtime User
What are you doing that generating a list can "hang" your app?

How long does it take to list the files? How many files are there to list?
Are there any mechanism to delete "old and already "parsed" onces? If not: why not?
How many data is inside the files?
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
NN-controlling app receiving .jpg-frames, 2-3 FPS, 24\7, and storing them for weeks.
Really long list for a single folder, changing every second.
And possibility to manually list, show, compare, delete...
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
i would use a server-step/routine in between. you can run a cronjob on linux for example to list all files and write the list to a file.
You app just read the file if you want o do something with the files.

On what operatin system they are stored?
Do you store everything in a single folder or grouped by week/day/month/whatever?
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
Ubuntu host, my java-app uses an API, receiving the frames.
And now storing is into a single folder. And i have found that normal (fast) work is possible only if files per folder qty, maybe 7...20 K max, or something...
RAM-disk in between is already used, but after all processing, the storage is required for long time, with listing\checking\comparing\deleting....

So, some storage system is needed for files millions...
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
I guess, that for such storage some class should be programmed, with
1) folder structure creation on-the-go, based on hours (or maybe smaller interval)
2) new file name\path calculation
3) files full list generation on background
4) file searching in this list
5) deleting files and folders that are outdated (the storage limit)
 
Last edited:
Upvote 0

Magma

Expert
Licensed User
Longtime User
better have folders like... year/month/day

as @DonManfred said, also saving what already "parsed" is nice tactic... (and is very semantic - to not loop at the same)

about the (5) you know better what to delete and what not...

ps: What is NN-controlling app ?
 
Upvote 0

Magma

Expert
Licensed User
Longtime User
NN... is see... but i can' get the idea (saving jpegs of what?)... ok that is something you know why...

1) @peacemaker ....hmm..... are you using custom way (a code, an app created by you) to save jpg files (screenshot), or ffmpeg, or an app that saves at the folder you need... ?

2) the files have date and hour into their filename ?
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
1) jpg files are received from 3rd party API. I have to store them myself according to the task.
And again: huge qty of files is bad for any folder in any OS, if i'm right...

We can just save tons of files into a single folder, but it's impossible to list them all _fastly_ any time you want... due to qty and time to list...

2) sure, even milliseconds.
 
Upvote 0

Magma

Expert
Licensed User
Longtime User
Generally is bad to have hundrends thousand files at a system, even at different folders... but is something manage-able

1) So if you are using api to receive them is easy to save them the same time at different folder task/year/month/day - no need to list

2) ..that is better... so they have a standard name IMG_YYYYMMDDHHmmssnn... you can play with listing filtering like this IMG_YYYYMMDDHH*.*

3) If you are using api.... and the jpegs are very small size... may be think you can save in an sqlite like blobs (every day in different DB)...
 
Last edited:
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
2) filtering how ? i do not understand how without getting the full list of names
3) 200 kb .jpg х 10 million = ... better not to try... 2TB+ db

So, it needs a continuously working file system over an OS file system, with changing 2...5 times per second, without freezing the host app, storing 10 million files during the fixed time interval and deleting outdated ones on the background...
 
Last edited:
Upvote 0

Magma

Expert
Licensed User
Longtime User
1) what about the first option ?

2) Well i am sure you already know it... at cmd line / terminal
for example at linux list only files of specific hour or minute or second...
(at linux/ubuntu/debian) LS IMG_20230918200001*.*
(at windows) DIR IMG_20230918200001*.*
i am sure that will be also way to do that in b4j or java...
This will get a list of all images taken at 2023 09 18 at 20:00 at 1st second... or you can do it 1st minute or... for an hour...

3) sure not good option saving on db... maybe saving only the filename and if parsed... (true/false)
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
Yes, but even "hour" qty of files may be too many.
No. Let the "serverapp" store all filenames in Database. Only the Filename together with some fields to search for them....

Do the search then in the database and get the files which are needed only. Should be FAST even after YEARS of storing.
You just need to have to use a good indexing in your folderstructure. year/month/day/hour where hour can be just hours 0-23 or even more up to minute
year/month/day/hour(0-23/minute(0-59)

There can be millions of files without any problem. Using the filenames on disc to search for anything may be a intensive task without Database....
 
Last edited:
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
No. Let the "serverapp" store all filenames in Database.
i meant the hour qty of files is too many for a single folder if to list them by Files.List(Async), hoping to get result fast.


can be millions of files without any problem
But if to use SQLite DB instead of Files.List(Async) - maybe is it enough just "day" folders, not "hour" ?
 
Upvote 0

DonManfred

Expert
Licensed User
Longtime User
maybe is it enough just "day" folders, not "hour" ?
I can not answer this. YOU need to count them. You have the knowledge what is coming in and how fast the amount of files is raising.
If i remember Linux has no problem in large Directories (amount of files). On Windows it can be problematic.

So better split them by year/month/day and if you want hour.
On every change you can run a batch to write the list of avaiable files to a txt file for each folder.
 
Upvote 0

peacemaker

Expert
Licensed User
Longtime User
Interesting, is it possible to get ... not whole file list ?

During work with stopping app, restarting - 1 or 2 oldest files were not deleted, but the DB records are deleted, and they are forever here...
 
Upvote 0

Magma

Expert
Licensed User
Longtime User
Interesting, is it possible to get ... not whole file list ?

During work with stopping app, restarting - 1 or 2 oldest files were not deleted, but the DB records are deleted, and they are forever here...
at debian - this will show only last 24 hours files:

find /yourfolder/ -ctime -1 -ls
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…