Installation and configuration

Ok, let's see how the FTP cluster is installed and configured.

Cluster Installation

Notice: This section applies only to your FTP cluster server, not to your cluster nodes (which is the reason why it's labeled "Cluster Installation").

  1. Fetch the required files:
    Get file-1.0.0.tar.gz and ftpcluster-1.0.7.tar.gz.

    Notice to upgraders: The REST support in version 1.0.7 requires an update of connect. Since this update is only cluster specific and I want the cluster installation to be as simple as possible I packed the connect sources into the ftpcluster tarball. Furthermore the new connect will be automatically installed under /usr/local/ftpcluster. If you didn't like having connect under /usr/locla/bin it's safe to remove it from there after the update.

    You also need gawk version 3.1.0 or above.

    # gawk -W version
    
    gives you the version of your installed version. If it's less than required grab compatible source code from www.gnu.org, configure, compile and install it.

  2. Compiling connect is not longer required, see notice above.

  3. gawk's file extension:
    This is more difficult because to compile the library you need the gawk sources.
    # tar xzvf file-1.0.0.tar.gz
    # cd file-1.0.0
    
    Now stop, before you try to make the library you can try if the pre-compiled awk.file.so library is working for you. Try something like
    #!/usr/bin/gawk -f
    #
    # test.awk - see if your awk recognizes awk.file.so
    #
    
    BEGIN {
    	extension("./awk.file.so", dlload);
    	
    	sbuf[0] = dir[0] = "";
    	n = scandir(".", dir);
    	for (i=1; i<=n; i++) {
    		stat(dir[i], sbuf);
    		printf ("%s %s %s\n", dir[i], sbuf["mtime"], sbuf["size"]);
    		}
    
    	exit (0);
    	}
    
    If you get the obvious output it may work. This is not a real test, it tests only if your gawk recognizes the library and knows how to process the included functions.

    If this does not work: first do a real sigh, get some coffee. Then fetch the gawk source code, unpack, configure and compile it. After this read file-1.0.0's README.

    Anyway, regardless of how you have been able to get a usable awk.file.so do a

    # make install
    
    to install the library under /usr/local/lib.

  4. Install the cluster software
    Ok, if you've made it so far unpack and install the cluster software. The commands below are just an example where to install the programs, the exact directory doesn't matter. I will refer to this directory as /path/to/programs.
    # tar xzvf ftpcluster-1.0.4.tar.gz
    # cd ftpcluster-1.0.4
    # make
    # make install
    
    The first make compiles two helper programs: cluster and mpec, the following make install will put the cluster programs in the directory /usr/local/ftpcluster.

    From version 1.0.4 the cluster programs must be installed under /usr/local/ftpcluster. Changing this location requires modification of the awk scripts, but it's possible. You should ask me if you really want this.

  5. Create a cluster user
    Before you do this think about where in your filesystem you want to put your cluster root directory. Let's assume that you have chosen /path/to/cluster. Think about the user's name, I assume cluster but you may decide on something else. Now add the user
    # useradd -d /path/to/cluster cluster
    # mkdir -p /path/to/cluster
    # chown cluster /path/to/cluster
    
    Important note: whatever cluster maintainance you do, do it as cluster, not as root or someone else.

    Let's do first maintainance. Notice the command prompt change when we become the cluster user. I'll continue to use this prompt convention in this document.

    # su - cluster
    $ pwd
    /path/to/cluster
    $ cat >.bash_profile
    PATH=/path/to/software:$PATH
    alias lu='list -u $*'
    press CTRL+D here
    $
    
    This adds /path/to/software to cluster's PATH variable so that you have your cluster programs always around. The lu alias is optional but you'll like it.

  6. Create an FTP cluster passwd
    We need an administrator account for the cluster:
    $ cat >.passwd
    admin::0:/u/home
    CTRL+D
    
    This create the user admin with an empty password (you should set one) with user id 0 (which makes the user the administrator) and the home directory /u/home.

  7. Install the cluster server
    Ok, you again root (watch your command prompt). Load your /etc/inetd.conf into you favourite editor and add the following line
    ftp  stream  tcp  nowait  cluster /usr/local/ftpcluster/server server
    
    and restart inetd:
    # ps -ax | grep inetd
    # kill -HUP process-id-of-inetd
    

  8. Test your setup
    Ok, we are ready for a first test. Log in to your FTP cluster as admin and create your home directory ...
    # ftp localhost
    Connected to localhost.
    220 server ready sid is here - login please
    Name (localhost:root): admin
    331 send password
    Password:
    Remote system type is UNIX.
    Using binary mode to transfer files.
    ftp> pwd
    257 "/"
    ftp> mkdir /u
    257 directory created
    ftp> mkdir /u/admin
    257 directory created
    ftp> quit
    221 server terminating
    
    ... and check if it's working.
    # ftp localhost
    Connected to localhost.
    220 server ready sid is here - login please
    Name (localhost:root): admin
    331 send password
    Password:
    Remote system type is UNIX.
    Using binary mode to transfer files.
    ftp> pwd
    257 "/u/admin"
    ftp> quit
    221 server terminating
    
    Everything ok? You should see the directory structure under /path/to/cluster. Check it with an lu -r as cluster admin. Is it there? That's fine. Now let's configure our node servers.

Node configuration

There is good news. If you've made it that far the rest is easy.

  1. Getting the node servers
    Decide which of your servers become cluster nodes. You need at least two of them (one server will also work but we want to run a cluster), better three to see the replicator work.

    The only requirement your node servers must meet is that you need an FTP server. Any FTP server should work. Well in theory. For this release use one that does not reply with multi-line responses. Standard FTP servers should do the job.

  2. Creating the cluster user
    Every node needs a cluster user with which the cluster programs log into it. You can use the same username and password on all your node or different logins. If you prefer different login you should write down which username can log on to which server with which password.

    Decide for each node where the cluster will store it's files. Now logon to a cluster and create the user:

    # useradd -d /usr/cluster node1user
    # mkdir -p /usr/cluster
    # chown node1user /usr/cluster
    # passwd node1user
    enter node's password here
    
    assuming you have chosen node as the cluster users username and /usr/cluster as the data location. For the later cluster configuration we'll assume that the name of the above server is node1 and that your cluster can resolve the name node1 to this server.

  3. Verify your setup
    You configured your nodes and still have the overview? Great. Now verify your setup. Make sure that you can log on to your node with the account information you have configured. Correct your configuration here if required. Let me stress here: using paper and a pen isn't a sign of weakness.

    Now think about the following: can node1 reach node2? If they are in the same network the answer is probably "yes". Otherwise you might want to ping from each node the other ones.

Cluster configuration

Good news again: we don't need a numbered list here.

Ok, we go back to our cluster server. We have to edit it's configuration file. It's name is .cluster.conf and it's in the $HOME directory of our cluster user. You remember that you should do every maintainance work as cluster user?

# su - cluster
$ vi .cluster.conf
The .cluster.conf should look like
server node1:21 node1user:password
server node2:21 node2user:password
server node3:21 node3user:password
The example configures three node, node1, node2 and node3 (with FTP server on port 21) with different cluster users and passwords as password. The .cluster.conf is in plain text. Don't worry that a user can fetch this file. ftpcluster will not server files or directories starting with a dot.

Ready to go

Finally we are ready. Log with FTP into your cluster server, upload some files, then logout. Make sure you are the cluster user.

$ pwd
/path/to/cluster
$ ls -l
you should see your uploaded file and it's info/link file
$ list -l
your uploaded file should appear as local
$ store -i
you see debugging output, it should be FTP protocol
$ list -l
your uploaded file should now be somewhere in your cluster
Now try to fetch the file. Working? Let's replicate your uploaded file.
$ list -l | awk '$NF > 2 { print "replicate", $2, $3, $4 }' | queue -e
$ replicate -i
you should see again debugging output
$ list -l
your file should now reside on two nodes
Everything worked so far. Ok, then you are in business.