Search:

ftpsync - synchronize files via FTP

ftpsync synchronizes a local directory to a directory on an FTP server.

More precisly does ftpsync bi-directional syncing between a client (called node) and a server (called peer) with simple conflict resolution. Syncing is not limited to one client and one server. ftpsync supports multiple nodes syncing to the same peer server and/or syncing to multiple peers.

The only requirement for ftpsync is an FTP server supporting the standard command as listed in RFC 959 and additionally the MDTM and SIZE commands.

This text explains some ideas behind ftpsync and how it's configured and used. ftpsync started as an gawk prototype which is included in the distribution.

One way synching

The "theory of operation" of operation is as follows. There is a directory on the local machine and another on a remote server (assuming that this is empty yet). In the first run all local files will be copied to the remote directory. In each later run only the files that have been modified after the previous run should be copied to the remote system. That is the syncer copies only those files that have to be copied to update the remote end.

Obviously our syncer has to keep track of each file's size and modification date after it has been copied to the remote server. If this information is stored in the format

filename <space> size <space> mtime

in a file in the current directory the syncer can determine in a later run, if a file was modified, created (which is the same as modified), deleted or if it is unchanged.

Two way synchronization

For true synchronization we have to consider our other's end. Files might also be modified or deleted there. In this case we have to update our local files by either getting the updated file from the server or deleting our local copy.

Basically we do for the remote server the same thing we did for our local files. That is, we keep track of the file's sizes and modification times and we retrieve the current file information from the server to compute the remote file's status. Only the way how we get the current file information is different since we can not simply stat() the files. ftpsync uses the MDTM and SIZE FTP commands for this. Although they are not standard they are in common use.

Synchronizing

Equipped with the files status information on the node and the peer ftpsync determines the action for each file on the following table:

local/remote unchanged changed deleted doesn't exist
unchanged nothing get remove put
changed put duplicate put put
deleted remove get ignore ignore
doesn't exist get get ignore ignore

To explain this table: the rows show the local node's and the cols the remote peer's file status, the values inside the table are the actions as seen by the node running ftpsync. E.g. get means that the file is retrieved from the peer server, put means uploading. Notice that the remove action's usage (twice) is not exact: is doesn't say if the file has to be deleted locally or on the server.

The "doesn't exist" status means that a certain file does not exist on one side, neither as file in the current directory nor in the previous status list. This happens if a file is created on one side before the directory is syncronised. The usual action is then to copy the file to the other side.

The situation is perhaps more difficult if the file does not exist on one side and is on the delete list of the other side. Under usual circumstances this can not happen but ftpsync has to deal with it. The solution is simple: we have to delete a file that is already deleted on one end and does not exist on the other -- the right action is to ignore it. The other ignores refer also to situation where nothing has to be done but (in opposite to the unchanged/unchanged nothing) it's not clear how the system entered this state.

Version conflicts

More interesting is the duplicate action. In this case we have two changed copies, one local and one on the server. What now? How can this conflict be resolved and which copy wins? The answer is that both win. If a duplicate situation is recognized the server's file is retrieved but the server's node name is appended to the filename to show that this file is the server copy. The server receives the local file but again the name is modified. This time the peer's name is appended to the filename. In other words: both sides keep their copy and receive the other end's version with a different filename. It's then up to the user to decide which of the versions is better. These conflict resolution files are not versioned, they are overwritten on the next conflict situation.

Notice that ftpsync first retrieves the file from the peer which is then compared against the node's copy. If it turns out that both copies are the same ftpsync removes the server copy and does not upload the node's version since it's equal to the peer's file. This keeps you from getting unneccessary duplicates if you had problems with your system time.

Symmetry

The action table above is symmetric. This means that none of the sides is prefered. Basically both sides could run the synchroniser, changing client and server role. The conflict resolver is also symmetric, more than this: it's "multi-symmetric". If you have a given number of nodes syncronising with the same server each node has it's own conflict resolution which does not interfere with another node's resolution. The only additional requirement is that each node has it's own unique name.

Variants

There are some possible modifications to the action table above. The unchanged/deleted/remove (abbreviated udR) could be changed to udP (put instead of remove) and ccD could become ccP. With this two changes ftpsync becomes a simple backup program. Backup program because files that need to be stored on the server are uploaded (files that are deleted or changed on the server are refreshed) to the server and simple because we have no file versioning.

local/remote unchanged changed deleted doesn't exist
unchanged nothing get put put
changed put put put put
deleted remove get ignore ignore
doesn't exist get get ignore ignore

Changing the symmetric entries duR and ccD to duG and ccG from the original action table would make the system running ftpsync the FTP server's simple backup system. I call these two modes "master" and "slave" mode.

The synchronizer can also be reconfigured to run as a "mirror" copying the files from the peer FTP server to local by changing the synchronization action table to the following.

local/remote unchanged changed deleted doesn't exist
unchanged nothing get remove ignore
changed get get remove ignore
deleted get get ignore ignore
doesn't exist get get ignore ignore

If we apply the symmetric changes to the action table we get the "original" (mirror local to peer FTP server) mode.

symsync Mode

ftpsync stores the file status information in two files: .sync-nodename:peername and .sync-peername:nodename. If symsync is turned on ftpsync puts these files on the peer swapping their names. With this node and peer can exchange their roles in a later ftpsync run.

Directory recursion

ftpsync can also synchronize subdirectories (recurse configuration option). By default it synchronizes only directories found on both ends, directory creation has to be configured additionally (createdirs option).

But notice the ftpsync will not delete directories. If you have synced a subdirectory and remove this later on either the node or the peer ftpsync will reconstuct the whole directory structure with the files from the other end.

Configuration file

ftpsync needs a configuration file to synchronize a directory. This file is named

and is located in the directory that should be sync'ed. The file has the typical UN*X-style: comments, starting with a "#", are allowed, empty lines too. The other lines are of the form "key value" with whitespace between key and value.

The mandantory configuration parameters are:

nodename nodename
The name of the local host running ftpsync. This doesn't have to the node's DNS hostname, it can be anything as long as it's unique among all nodes syncing to the same peer server location. You should use only letters, digits and dashes (minus signs) here.

peername peername
The FTP server's name. Again you don't have to enter the peer's DNS name here (although you can). Choose any name you like as long as it contains only letters, digits and dashes.

server servername
This is either the peer's name (e.g. full qualified domain name) or it's IP number. In contrast to peername the servername parameter is used to connect to the server.

login username
The loginname on the FTP server.

password password
The password belonging to username.

Furthermore ftpsync recognizes the following configuration options:

createdirs yes|no
If set to yes ftpsync will create missing subdirectories in recursion mode.

dir directory
The directory on the FTP server to which you want to syncronize to. If unset username's home directory is used.

includedots yes|no
If set to yes files beginning with a dot will also be subject to sychronization. Files beginning with "
." or "
-" are still excluded.

mode syncmode
Defines ftpsync's synchronization mode, can be one of "sync", "master", "slave", "mirror" and "original", the default is "sync".

passive yes|no
If set to yes ftpsync will try passive mode data connections.

recurse yes|no
If set to yes ftpsync descends into every subdirectory that exists on both ends (node and peer) and synchronize them too.

symsync yes|no
If set to yes ftpsync's file information files are copied and swapped to the server.

An example for a configuration file is

#
# .sync.conf - ftpsync configuration file.
#
nodename      pc
peername      server

server        192.168.0.4
login         my-ftp-account
password      my-secret-password

mode          sync
recurse       yes
includedots   yes
symsync       yes
passive       no

If the login password is given in the configuration file ftpsync insists on the permissions of

: if they allow read/write access for others than the file's owner it refuses to use this file. The same is in effect for your ~/.netrc. If ftpsync can't find the password neither on it's command line nor in the configuration file it tries to read it from ~/.netrc, but again it will terminate if this files has the wrong permissions.

Command line options

ftpsync supports some command line options.

-b
sets symsync mode.

-i
ignores ~/.netrc for password determination.

-l [username][:password]
sets username and password for the server login.

-m syncmode
sets the synchronization mode.

-q
sets query mode, makes ftpsync show only what would be done. If -q is specified twice ftpsync prints only syncmode's action matrix and terminates.

-Q
with -Q ftpsync will also list what would be done (as for -q) but also update the file status information. After a -Q run all files will look synchronized even if they are different.

-r
sets recurse mode. If -r is given twice ftpsync will also create missing subdirectories (createdirs option).

-s
sets silent mode, unchanged files are not listed in ftpsync's output.

Invocation and usage

ftpsync is invoked with an optional peername on the command line:

ftpsync [options] [peer]

If peer is given ftpsync reads it's configuration from the file

-peer
instead from the file
. Although ftpsync supports configuration options as command line options it is intended to use ftpsync with a configuration file.

The -q option is perhaps most useful, first it can tell you what ftpsync would do and it can list the compiled action matrix.

When you have configured directory recursion with directory creation ftpsync will synchronize every directory in can find, regardless of the content. If this is not what you want you can set createdirs to no running ftpsync with the options -rrq. This will list you all first level subdirectories that would be created if createdirs would be enabled. This "dry run" gives you then the possibility to create the directories you want to synchronize manually own your own.

The -Q option is only of use if you had real problems on either the node or the peer, you are sure that the files are the same but ftpsync thinks they are different.

Output format

ftpsync prints for each file it finds one line. This line shows the file's status on the node, the status on the peer (both abbreviated with a single letter)

u unchanged
c changed
d deleted
x doesn't exist

the computed action (again abbreviated)

E internal error
N nothing
G get file from peer
P put file to peer
D duplicate file
R remove file from peer or node
I ignore file

followed by the filename. In case of a duplicate situation where the file comparision shows equality of the files ftpsync prints a line with an equal sign as action indicator.

Other output lines start with a single letter:

M mismatch on file type
< peer directory creation
> node directory creation

Furthermore ftpsync prints an empty line followed by a line beginning with a star followed by the directory name for each subdirectory it synchronizes. With the -s option information lines regarding files with either the N or I action are hidden from the output.

< dag | at | awk-scripting.de >