The following text describes the design of the cluster as it was developed. Details described here might be changed or removed in a later release version. Consider parts of this text as "historical". On the other hand this text gives some background information about the hows and whys.

Files, formats and FTP - running the cluster, version 1.0.0

After this first introduction it's time to look how the cluster programs do their job. Read this if you want to know a little bit more about the details behind the cluster.

FTP operations

So again, the basic idea behind the cluster is that the cluster only holds link that contain the real file locations. Everytime a client requests a file it is redirected to the cluster node having the file. This gives us an implementation of the RETR command.

We don't expect anything special from the cluster node beyond that we are able to create directories, store and delete files. Files of a single cluster directory can be spread about any number of cluster nodes into arbitrary directories. Cluster directories are not created on the nodes.

Where do we then put the information we need to list directories? We put them in the link files. Here is an example:

rfc-0959.txt 147316 root 1044844761
server 192.168.0.2:21 61/47/030210-033921-rfc-0959.txt
server 192.168.0.3:21 61/47/030210-033921-rfc-0959.txt
It says that the file the link is for (the "real" file) Furthermore the real file is stored on two locations on our cluster, on server 192.168.0.2 and 192.168.0.3. The FTP server's port is in both cases 21. The directory location on both servers is 61/47 and the file's node filename is 030210-033921-rfc-0959.txt. Almost obvious, isn't it?

A notice to the relation between the real file's name and the name we find in the link file. The filename we find in the link file reflects the real file's name immediatly after it was uploaded. If the file is renamed the contents of the link file is not recomputed. If I decide my rfc-0959.txt to ftp-rfc.txt later the link file will still show rfc-0959.txt

So the link file contents give us the information we need to create directory listings. Well, it does it as soon as we have a way to map filenames to link filenames. Or in other words: which link files belong to which real file? This is simple. The link filename is the name of the real file with an additional # at the end. Going back to our example RFC, if the real filename if rfc-0959.txt the link filename is rfc-0959.txt#.

This brings us to a short insert: the FTP server is very picky about filenames. It allows only digits, letters, minuses, pluses, underscores and dots in file and directory names. Especially no #. And no blanks, but this will change in the nearer future.

But back to filenames. If a file is stored on a cluster node it's stored in a certain directory. This is computed by filemover using a simple function that to distributed all files on a number of directories on the cluster node to prevent directories having to much files in it. This scheme might however change in the future. Think of it as a black box function. The filename itself is made of the date and time of the time the file is stored on the first node followed by the real filename (the original one). This may also change in the future because of a better naming scheme.

You might ask "Why a naming scheme?" when there is the link on the cluster server. You're right, good point. But I expect that sooner or later cluster/node operation might fail for some reason leaving orphaned files (files on a node without a link on the cluster) on the nodes. A good naming scheme for the cluster files might help deciding what the original file was and if the orphaned file is really orphaned or if there is another problem in the cluster.

Let's see what we have so far.

This means that we can generate directory listings as soon ... yes, as soon as we know which file belongs into which directory. Aha, the next problem. The answer is that the cluster server maintains the whole cluster directory structure, link files are stored where the real file would be stored if the cluster would be a normal FTP server. That is, if I store my rfc-0959.txt in the directory RFCs the link file is stored in that directory.

Now we have everything to create directory listings or, speaking FTP, the LIST and NLST commands.

Since the cluster server implements the directory structure in it's own directory tree we get immediatly the commands MKD, RMD, PWD, CWD and CDUP.

For file deletition it's ok if we remove the link file from the directory tree. Should the cluster server delete the files on the nodes? It could do so, but consider that a file is stored on two nodes. One is up and running and the cluster can delete the file but the second is down, it has to be tried later again. To get some flexibility the server only moves the link file to a special directory and a later process will do the real deletition. The delete job goes to some kind of queue. We would have to use a queue anyway if we think about the two server problem above. Let's add the DELE command to our list.

Fine. Now let's rename files. If we simply rename the link file everything looks good. Since directory listings don't use the contents of the link files we are done. Ok, but let's now consider that a file with the new filename of the rename operation already exists. We can't simply overwrite the link file because it carries information about files we have to delete on the nodes. So what we do here is we move the link file that is going to be overwritten to the ready-for-delete folder we introduced above. In other words we run a DELE operation on it. RNFR and RNTO are now also on the command list.

We are almost trough it but we have to consider uploads. Interesting detail of FTP, isn't it? This can be implemented as downloads: the server chooses a node that receives the file and redirects the client. This can be a possible solution. But there might be also problems with is. The upload server could run out of space before completing the upload. What happens then? After a succesful upload we have to write the file's size to the info/link file. But how large is the file? Are we able to parse every node's LIST output or do all our nodes support the SIZE command?

To keep things simple now (they are already difficult) I decided that the cluster server itself takes the upload. Here we can control how much free space is left without bigger problems. The cluster server creates a job description for moving the file to a node which is then done later (as for file deletes). This give the real storage process also time to decide where to put the file. But I might change this to the redirection variant in the future, or better give the choice to the admin which is probably the best. Anyway we have a STOR command.

And this is also the implemented FTP command set we have so far. Well there are also QUIT, TYPE (the server pretends only to support TYPE), NOOP, SYST and PORT but these are not related to the cluster operations.

Updates for version 1.0.3

After the initial release I started thinking about user authentication. Ok, a passwd is obvious where we store usernames and their passwords. Sounds simple. If a username changes we simply edit the passwd file and that's all.

But wait. The info/link files contain the usernames of the file's owner. So if we change a username his or her file become orphaned in that sense that they loose their owner. Changing the owner as it is written in the info/link files is possible but a bad idea. So the first change in version 1.0.0+ was the use of user ids instead of their names in info/link files.

As an immediate result portions of list had to be changed. When these changes were made I noticed that directories didn't show the owner but the UNIX user running the cluster. Well, what did I expect? If I don't store the usernames (or better ids) how can a program tell the owner? So the next change was the introduction of info/link files for directories with the owner's user id as only really useful information.

Now we have info/link files for everything our users create. But we still have to accept that there are files created outside from our cluster. More worse, they could be created by a different user and the cluster server user might not have the permissions to modify the files. How should the server deal with such files? The answer is simle. Since there's no info/link file attached the file wasn't created from within the cluster. Such files are considered immutable, not a cluster business. It is willing to serve such files (if permitted) but will not even try to modify such files.

These changes together required a major rewrite of list. But at the end list was good enough at this job that the cluster server calls list for directory listing generation.

To repeat, here's the list of changes in short.

Updates for version 1.0.7

This release didn't bring major updates to server. Support for MDTM and SIZE was added to make some clients happy. The partial support for REST is perhaps more notable. Uploads can now be restarted if the file exists on the cluster server and not on one of the nodes. Downloads are always restartable.