Ok, let's consider you have your cluster up and running. You made some uploads and moved them to your nodes with store. Perhaps you now want to know where your files are.

The list command

list is your friend for this kind questions. I'll explain it's use with the following node setup:

# su - cluster
$ pwd
/path/to/cluster
$ cat .cluster.conf
server 192.168.0.2:210 cluster1:secret-pw
server 192.168.0.3:210 cluster2:more-secret-pw
$
I have two node servers, running on addresses 192.168.0.2 and .3, both servers on port 210.

There are only a few files, but enough for an example:

Connected to localhost.
220 server ready [030224-012930-01A4] - login please
331 send password
230 login accepted
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> pwd
257 "/u/admin"
ftp> ls
200 ok
150 listing follows
drixr-xr-x    1 admin    ftp             0 Feb 24 01:40 ..
-ri-r--r--    1 1007     ftp           933 Feb 24 01:40 README
-rw-r--r--    0 admin    ftp          5132 Feb 24 01:39 cluster.lib
-rw-r--r--    1 admin    ftp        193660 Feb 24 01:36 ftpcluster-1.0.3.tar.gz
-rw-r--r--    1 admin    ftp          5430 Feb 24 01:38 list
-rw-r--r--    0 admin    ftp          4136 Feb 24 01:39 mpec.lib
-rw-r--r--    1 admin    ftp          5337 Feb 24 01:37 passwd
-rw-r--r--    1 admin    ftp          9167 Feb 24 01:37 replicate
-rw-r--r--    1 admin    ftp        147316 Feb 24 01:36 rfc-0959.txt
-rw-r--r--    2 admin    ftp         22678 Feb 24 01:37 server
-rw-r--r--    2 admin    ftp          7645 Feb 24 01:37 store
drwxr-xr-x    1 admin    ftp          4096 Feb 24 01:38 tmp
226 transfer complete
ftp> quit
221 server terminating
The output looks like a regular ls -l listing? Well it's completly synthetic. You might notice the zero's where normal ls shows the file's link count and the `i' in the README permission field. The `i' signals that the README file is immutable. It is an existing file but there's no info/link file attached to it. If you try to write to this file the server will deny. This is also the reason for the user id 1007. The 1007 is the linux user id and not cluster user's id.

The link count field is used to show the number of file copies in the cluster. 0 means that the file is only stored on the cluster server itself and that it has not yet been moved to a node. The files server and store are stored on two cluster nodes and all other files are only stored once.

Now we go back to the cluster's command shell and the list command.

$ pwd
/path/to/cluster
$ cd u/admin
$ list
/u/admin:
file README 933 1046047220 1007 - -
0 cluster.lib# 5132 1046047182 admin - -
1 ftpcluster-1.0.3.tar.gz# 193660 1046047007 admin 192.168.0.2:210 07/70/030224-013647-ftpcluster-1.0.3.tar.gz
1 list# 5430 1046047091 admin 192.168.0.2:210 91/70/030224-013811-list
0 mpec.lib# 4136 1046047189 admin - -
1 passwd# 5337 1046047075 admin 192.168.0.2:210 75/70/030224-013755-passwd
1 replicate# 9167 1046047078 admin 192.168.0.3:210 78/70/030224-013758-replicate
1 rfc-0959.txt# 147316 1046047015 admin 192.168.0.3:210 15/70/030224-013655-rfc-0959.txt
2 server# 22678 1046047047 admin 192.168.0.2:210 47/70/030224-013727-server
2 server# 22678 1046047047 admin 192.168.0.3:210 47/70/030224-013727-server
2 store# 7645 1046047070 admin 192.168.0.3:210 70/70/030224-013750-store
2 store# 7645 1046047070 admin 192.168.0.2:210 70/70/030224-013750-store
dir tmp# 4096 1046047121 admin - -

If you have used version 1.0.0 you'll notice that the format has changed a little bit but the basic format is still the same. For each file in your cluster list shows The values are blank separated and blanks do not appear in the values. So altough such a listing is not easy to read it can be easyly parsed by an awk script. You might want to reformat the output for better reading.

You noticed the deviations in the actual output from the description? Well the content of each can (and must) be determined from the first field.

dir
is an output line for a directory. Directories are never copied to nodes so the last two fields are always -.

file
describes an immutable file, that is a file in the cluster space without an info/link file. Although these files are inside the cluster they are not (and can not be) considered as subject to node storage. Having said this the last two fields are here - too.

0
a zero is printed if the file is a regular cluster file that has yet not been stored on a node.

any other number
is the number of file copies in the cluster, the last two fields holds server and path to the copy on that server. Each copy is listed on a line on it's own.

If you now look back at list's output above the intpretation and meaning of the lines should be clear.

list has one important option: -l to select the long listing format. This is the same as above with the difference that the link file's filename (second field) is preceded by the directory relative to the cluster's root directory:

$ list -l
/u/admin:
file /u/admin/README 933 1046047220 1007 - -
0 /u/admin/cluster.lib# 5132 1046047182 admin - -
1 /u/admin/ftpcluster-1.0.3.tar.gz# 193660 1046047007 admin 192.168.0.2:210 07/70/030224-013647-ftpcluster-1.0.3.tar.gz
1 /u/admin/list# 5430 1046047091 admin 192.168.0.2:210 91/70/030224-013811-list
0 /u/admin/mpec.lib# 4136 1046047189 admin - -
1 /u/admin/passwd# 5337 1046047075 admin 192.168.0.2:210 75/70/030224-013755-passwd
1 /u/admin/replicate# 9167 1046047078 admin 192.168.0.3:210 78/70/030224-013758-replicate
1 /u/admin/rfc-0959.txt# 147316 1046047015 admin 192.168.0.3:210 15/70/030224-013655-rfc-0959.txt
2 /u/admin/server# 22678 1046047047 admin 192.168.0.2:210 47/70/030224-013727-server
2 /u/admin/server# 22678 1046047047 admin 192.168.0.3:210 47/70/030224-013727-server
2 /u/admin/store# 7645 1046047070 admin 192.168.0.3:210 70/70/030224-013750-store
2 /u/admin/store# 7645 1046047070 admin 192.168.0.2:210 70/70/030224-013750-store
dir /u/admin/tmp# 4096 1046047121 admin - -

You will need this option when it comes to file replication. Another important switch is -r which lists recursivly descending into sub directories.

As I said above list's output is for awk processing (if you want humand readable output try the -u option). Here are some examples:

  1. Which files are held locally on the cluster server?
    $ list -l | awk '$1 == 0'
    

  2. Which files live on a particular server, let's say 192.168.0.2:210?
    $ list -l . | awk '$6 == "192.168.0.2:210"'
    

  3. How much disk space do we allocate on a server?
    $ list -l . | awk '$6 == "192.168.0.2:210" { size += $3 }
    		   END { print size }'
    
    Files and allocated space of a particular user can be computed the same way.

Generally speaking list is the source of information that is further processed in a pipe to deliver the information we are really interested in. Since this is early prototyping there are no scripts available but I plan to provide some in a future release.