One of ftpcluster's nice things is that you are able to create a large FTP based fileserver from (more or less) cheap individual servers. But what happens to your files when one of your server breaks?

File replication

The first thing to mention here is that in case of a problem with one server your cluster continues to work. But, and it's obious, if a server is not running you can not access the files that are (or were) stored on this server.

But there is help, even for this prototype. For me the question of replication is very basic for the whole idea of an FTP cluster. Ok, how does it work? But stop, let's first look some replication scenarios.

So we have some scenarios. How many copies do we want? Two or three or more? Which scheme should be implemented? The answer is that I do not implement any particular replication scheme. I leave this up to you and your decisions.

For this reason replicate reads only descriptions files, or queued batch files, of what it should do. replicate doesn't care about where these files come from or what the underlying scheme is.

replicate looks in the .queue directory for filenames starting with replicator- and processes them line by line. Jobs that can't be executed now (because of a node dowtime) will be resubmitted.

Job descriptions have the format

replicate linkfile size mtime [nodelist]
where

Ok, enough theory. How can we create replication job files? You have read about the list command? The following example creates an additional replica of all existing files in your cluster.

# su - cluster
$ echo $pwd
/path/to/cluster
$ list -lr | awk '$1 > 0 { print "replicate", $2, $3, $4 }' >data
$ mv data .queue/replicator-1
$ replicate -i

You get the idea? You filter the output of list according to your own replication policy and feed this into the replication queue.

Another example: create a replic of each file that is only once on our nodes.

$ list -l | awk '$1 == 1 { print "replicate", $2, $3, $4 }' | queue -e

Job descriptions may have an additional nodelist argument. This is a blank separated list of node names where you want a copy. You can also prefix a node name with a minus so signal that replicate should remove the file from this node after the additional copy has been created:

$ list -l | awk '$6 == "node1:21" {
		   print "replicate", $2, $3, $4, "node2:21", "-node1:21"
		   }' | queue -e
copies all files from node1 to node2 and deletes them after that from node1. When this job is finished you can power down node1 and do whatever you want with it.

Notice that for replicate it's neccessary that your node servers can reach each other because replicate copies the files directly from server to server. In other words, if replicate runs on your cluster server and it should copy a file from node1 to node2 replicate will not (!) first copy the file to the cluster server and to node2. It talks to both nodes making node1 sending and node2 receiving the file.