B4J Question huge files quantity

peacemaker · Sep 18, 2023

HI, All

As i can understand, if a system during work generates huge qty of files - it's very bad situation for absolutely any operating system ?
Due to if you need to work with files - you have to list them all first to iterate further, and for huge qty - you'll just hang your app up waiting for listing result...

Correct ?

If yes - maybe someone already tried to solve this situation and made some code functions to store\get files from a huge files storage ?

DonManfred · Sep 18, 2023

What are you doing that generating a list can "hang" your app?

How long does it take to list the files? How many files are there to list?
Are there any mechanism to delete "old and already "parsed" onces? If not: why not?
How many data is inside the files?

peacemaker · Sep 18, 2023

NN-controlling app receiving .jpg-frames, 2-3 FPS, 24\7, and storing them for weeks.
Really long list for a single folder, changing every second.
And possibility to manually list, show, compare, delete...

DonManfred · Sep 18, 2023

i would use a server-step/routine in between. you can run a cronjob on linux for example to list all files and write the list to a file.
You app just read the file if you want o do something with the files.

On what operatin system they are stored?
Do you store everything in a single folder or grouped by week/day/month/whatever?

peacemaker · Sep 18, 2023

Ubuntu host, my java-app uses an API, receiving the frames.
And now storing is into a single folder. And i have found that normal (fast) work is possible only if files per folder qty, maybe 7...20 K max, or something...
RAM-disk in between is already used, but after all processing, the storage is required for long time, with listing\checking\comparing\deleting....

So, some storage system is needed for files millions...

Erel · Sep 18, 2023

Don't miss File.ListFilesAsync. This way your app will not freeze while you build the list.

peacemaker · Sep 18, 2023

I guess, that for such storage some class should be programmed, with
1) folder structure creation on-the-go, based on hours (or maybe smaller interval)
2) new file name\path calculation
3) files full list generation on background
4) file searching in this list
5) deleting files and folders that are outdated (the storage limit)

Magma · Sep 18, 2023

better have folders like... year/month/day

as @DonManfred said, also saving what already "parsed" is nice tactic... (and is very semantic - to not loop at the same)

about the (5) you know better what to delete and what not...

ps: What is NN-controlling app ?

peacemaker · Sep 18, 2023

Magma said:
year/month/day

Yes, but even "hour" qty of files may be too many.

Magma said:
saving what already "parsed

I think, here better to use a SQLite db as "HHFS", "handmade huge file system"

Magma said:
ps: What is NN-controlling app ?

NN = neural network

Magma · Sep 18, 2023

NN... is see... but i can' get the idea (saving jpegs of what?)... ok that is something you know why...

1) @peacemaker ....hmm..... are you using custom way (a code, an app created by you) to save jpg files (screenshot), or ffmpeg, or an app that saves at the folder you need... ?

2) the files have date and hour into their filename ?

peacemaker · Sep 18, 2023

1) jpg files are received from 3rd party API. I have to store them myself according to the task.
And again: huge qty of files is bad for any folder in any OS, if i'm right...

We can just save tons of files into a single folder, but it's impossible to list them all _fastly_ any time you want... due to qty and time to list...

2) sure, even milliseconds.

Magma · Sep 18, 2023

Generally is bad to have hundrends thousand files at a system, even at different folders... but is something manage-able

1) So if you are using api to receive them is easy to save them the same time at different folder task/year/month/day - no need to list

2) ..that is better... so they have a standard name IMG_YYYYMMDDHHmmssnn... you can play with listing filtering like this IMG_YYYYMMDDHH*.*

3) If you are using api.... and the jpegs are very small size... may be think you can save in an sqlite like blobs (every day in different DB)...

peacemaker · Sep 18, 2023

2) filtering how ? i do not understand how without getting the full list of names
3) 200 kb .jpg х 10 million = ... better not to try... 2TB+ db

So, it needs a continuously working file system over an OS file system, with changing 2...5 times per second, without freezing the host app, storing 10 million files during the fixed time interval and deleting outdated ones on the background...

Magma · Sep 18, 2023

1) what about the first option ?

2) Well i am sure you already know it... at cmd line / terminal
for example at linux list only files of specific hour or minute or second...
(at linux/ubuntu/debian) LS IMG_20230918200001*.*
(at windows) DIR IMG_20230918200001*.*
i am sure that will be also way to do that in b4j or java...
This will get a list of all images taken at 2023 09 18 at 20:00 at 1st second... or you can do it 1st minute or... for an hour...

3) sure not good option saving on db... maybe saving only the filename and if parsed... (true/false)

DonManfred · Sep 18, 2023

peacemaker said:
Yes, but even "hour" qty of files may be too many.

No. Let the "serverapp" store all filenames in Database. Only the Filename together with some fields to search for them....

Do the search then in the database and get the files which are needed only. Should be FAST even after YEARS of storing.
You just need to have to use a good indexing in your folderstructure. year/month/day/hour where hour can be just hours 0-23 or even more up to minute
year/month/day/hour(0-23/minute(0-59)

There can be millions of files without any problem. Using the filenames on disc to search for anything may be a intensive task without Database....

peacemaker · Sep 18, 2023

DonManfred said:
No. Let the "serverapp" store all filenames in Database.

i meant the hour qty of files is too many for a single folder if to list them by Files.List(Async), hoping to get result fast.

DonManfred said:
can be millions of files without any problem

But if to use SQLite DB instead of Files.List(Async) - maybe is it enough just "day" folders, not "hour" ?

DonManfred · Sep 18, 2023

peacemaker said:
maybe is it enough just "day" folders, not "hour" ?

I can not answer this. YOU need to count them. You have the knowledge what is coming in and how fast the amount of files is raising.
If i remember Linux has no problem in large Directories (amount of files). On Windows it can be problematic.

So better split them by year/month/day and if you want hour.
On every change you can run a batch to write the list of avaiable files to a txt file for each folder.

peacemaker · Sep 19, 2023

Interesting, is it possible to get ... not whole file list ?

During work with stopping app, restarting - 1 or 2 oldest files were not deleted, but the DB records are deleted, and they are forever here...

Magma · Sep 19, 2023

peacemaker said:
Interesting, is it possible to get ... not whole file list ?

During work with stopping app, restarting - 1 or 2 oldest files were not deleted, but the DB records are deleted, and they are forever here...

at debian - this will show only last 24 hours files:

find /yourfolder/ -ctime -1 -ls

Magma · Sep 19, 2023

find [path] -type f -mmin n

n are the minutes...
better without "ls"

B4J Question huge files quantity

Expert

Expert

Expert

Expert

Expert

B4X founder

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Similar Threads

Privacy & Transparency

Privacy & Transparency