After we have seen how file retrieval works we can take a look at the other cluster operations.

Storing data

Basically storing data could be implemented just as retrieving files. The server determines which node should take the data, connecting to it, sending PORT and STOR command.

Ok, but what if the designated node is not up and running? If this happens for downloads they can be done later. Choosing ad-hoc a different for uploads can be a problem. Consider that you have 5 nodes per 20GB total FTP space but each of them has only half a gig of free space. Now you want to store some backup date with a size og 900MB. And (as usual in these situation) your brand new 60GB cluster node decided to take a downtime.

And that we are already taling about storage: how do we distribute our files in the cluster? Do we have the need to store certain files twice for security matters? Interesting questions. I don't have good answers to them yet. Ok, so here's the first iteration: the cluster server takes the upload first. The files are later moved to one or more nodes.

Uploaded files could be marked for moving by creating the link file when the upload is complete. This way it's relativly simple to determine which files should be moved and it gives room for storing permanent local files (real files without link file). But restarting a broken upload is difficult (if possible at all) if the partial upload is stored somewhere in the cluster. So there should be perhaps a better way to solve this problem. But all in all the cluster server should be able to deal with both files in it's file space.

Directory stucture

To keep things simple for the first prototype the cluster server creates and maintains the directory structure in it's file space. That is if the cluster file space has a /download directory the cluster server has a download directory in it's top level FTP space directory.

Listings

Another important task are directory listings. We have two requirements for that. The first is that we want to have only pointer to files living on a cluster node. The second is that (at least) sometimes the server has the real files (after an upload).

This works as follows: the server store two different types of files in it's directory tree: normal files and link. Link files always have a trailing # in their names. This way the cluster server can easily decide if a file is a link or a real file. Furthermore if the link if for a file named 00-readme.txt the link is named 00-readme.txt#. With this scheme the server can determine if it has the link, the real file or both on it's own disk.

For the file information itself the link file has to store some additional information beyond the file's location. To generate the listing the server should also find the file's size and last modification date in the link file.

Deleting files

With the current concept the cluster server needs at least one additional program: the file mover that takes the files from the cluster server to one of our nodes. The cluster uses the same approach for deletition. If a file is deleted (and the cluster server has only the link) the link file is moved into a "marked-for-deletition" directory. Another program will do the work later for us. This way we might be able to deal with the situation that we have more than one copy of a file in the cluster but not all relevant nodes are up and running.

Renaming

Renaming files or directories is not really difficult. Since the cluster server has the real directories on it's disk it can rename them as usual. For link or real files the situation is almost the same.