Having your way with rsync

mirror image shannon kringen
Credit: flickr / Shannon Kringen

Maybe you want the directories that you synchronize to be exact copies of each other or maybe you just don't. Let's dig a little more deeply into the rsync command and see if we can't find just the right mix of options for what you want to do.

Basic copying

The rsync command can replicate collections of files from one place to another in every possible detail or it can allow you to control exactly how that replication flows -- what it replicates and what it doesn't.

In its simplest form, the rsync command will copy files from the file source to the file destination. It will not remove files on the destination side that aren't on the source and it won't recreate all of the metadata (e.g., ownership and group details) unless your rsync command includes just the right set of options. So let's follow along behind a series of rsync commands to see just what they do and don't do in response to our synchronization requests.

First, if you specify the name of a directory as a source, rsync is going to create (or update) a directory by that name on the source location as you can see from this example. Here, we're working with two folders on the same system. When we start, both of the orig and copy directories exist, but have only some files in common.

$ ls orig
a  b  c  d  e  f
$ ls copy
a  c  e  g

When we run the rsync command with the -v (verbose) option, we can see that the command is copying all of the files from one directory to the other.

$ rsync -av orig/* copy
sending incremental file list
a
b
c
d
e
f

sent 311 bytes  received 126 bytes  874.00 bytes/sec
total size is 0  speedup is 0.00

Since almost no data was transferred with these small files, the command moves along, but no speedup is observed. Look at the destination directory afterwards and we can see that it now contains all of the files in the orig folder. It also retains the one file on the destination that was there before the rsync operation (and doesn't exist on the source.

$ ls copy
a  b  c  d  e  f  g

Staying in sync

In the next example, we are working between two systems and replicating a directory called "archive". The first operation copies everything, creating the archive directory on the remote system. We use just ~unidweeb as the destination on the remote system (home of the user called "unixdweeb").

orig$ rsync -av archive remhost:~unixdweeb
building file list ... done
archive/
archive/try1
archive/tryme
archive/tryme1
archive/tryme2

sent 1486 bytes  received 114 bytes  1066.67 bytes/sec
total size is 1170  speedup is 0.73

When there's nothing to be done

The next time we run the rsync command, there is nothing to copy. Nothing has changed in the archive directory so nothing is changed on the original. Still we see some bytes going back and forth because the rsync processes still need to compare notes and determine if any files or file content needs to move.

orig$ rsync -av archive remhost:~unixdweeb
building file list ... done
archive/

sent 132 bytes  received 26 bytes  63.20 bytes/sec
total size is 1170  speedup is 7.41

Dealing with new content

When a new file shows up on the remote archive, we have to adjust our rsync command if we want our two archive directories to remain exactly the same. Our next rsync operation shows that's just what happens. By adding the --delete option, we tell rsync to delete any files on the destination system that don't exist on the source system.

orig$ ls archive
newfile  try1  tryme  tryme1  tryme2
orig$ rsync -av --delete archive remhost:~unixdweeb
building file list ... done
deleting archive/newfile
archive/

Notice the "deleting archive/newfile" message that appears in our verbose output. If you want your rsync operation to use the local system as the content authority and make sure that the remote copy looks exactly the same as the original one, --delete is the option to use.

Next, we see the bytes transferred statistic and a very modest speedup.

sent 136 bytes  received 26 bytes  64.80 bytes/sec
total size is 1170  speedup is 7.22

And, as you'd expect, the new file that somehow snuck into the remote system's archive directory is now gone.

dest$ ls archive
try1  tryme  tryme1  tryme2

In addition, were we to examine the files on both systems, the ownership and permissions would be the same because we are making use of the -a (archive) option that ensure this is the case.

To replicate or not

If, instead, we want to maintain whichever files in the two archive directories are the newest, there are options for that as well.

The --existing option tells rsync to only update files that already exist on the remote system and not to create new files. Notice in the example below that newfile2 is not replicated.

orig$ echo hello > archive/newfile2
orig$ rsync -av archive --existing sea-aveksa-1:~shs
building file list ... done
archive/

sent 163 bytes  received 26 bytes  126.00 bytes/sec
total size is 1176  speedup is 6.22

We might also want to tell rsync not to touch files that are newer on the destination side. To demonstrate how this works, we first create a new file called "newfile" on the destination server.

dest$ echo "don't touch that" > archive/newfile2

And then, on the local system, we run rsync again and notice that newfile on the destination server is not overwritten by the file by the same name on the source system.

orig$ rsync -av archive --update sea-aveksa-1:~shs
building file list ... done
archive/

sent 163 bytes  received 26 bytes  75.60 bytes/sec
total size is 1176  speedup is 6.22

To ensure that we don't create new files and don't overwrite files that are newer on the destination side, we can combine these two options. Notice in the output shown below that no changes were made.

orig$ rsync -av --update --existing archive sea-aveksa-1:~shs
building file list ... done

sent 157 bytes  received 20 bytes  70.80 bytes/sec
total size is 1176  speedup is 6.64

Excluding content

We can also exclude portions of a directory that you don't want copied from one system (or file system location) to another by using the --exclude option. An example of --exclude is shown in the command below.

$ rsync -aAXv --exclude 'junk' upgrade /backups/

We can exclude multiple directories with just a little more effort. The command below excludes both the junk directory and one called "notes". The paths are relative to the current directory.

rsync -aAXv --exclude={'upgrade/junk/*','upgrade/notes/*'} upgrade /tmp/

Because we included the /* after each of the directories to be excluded, the directories themselves are replicated, but not their content.

When in doubt, check it out

Whenever you're struggling to get the syntax right on an rsync command that is at all complex, remember that you can try out the command without actually making any changes by using the --dry-run option along with -v (verbose). This will show you what rsync would be doing if you ran the command for real, but won't actually replicate anything.

This story, "Having your way with rsync" was originally published by Computerworld.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon