Backing Up Files Using rdiff-backup

rdiff-backup is a python based backup application that can create incremental backups over a network using ssh. It requires rsync and obviously python. Each of these applications should come standard with your distribution, so you’ll just need to follow the usual process for installing applications for your distribution.

In this tutorial, you will:

  • Setup passwordless ssh on both the server and client;
  • Setup rdiff-backup to do incremental backups via a bash script
  • Setup a scheduled task to run the backup at regular intervals

remote sever setup

In this howto for clarity’s sake, I’m going to refer to the computer you backup the data to as the server and the computer you back data from as the client.

If you are going to do incremental backups to a server, then you’ll need rdiff-backup installed on both the server and the client. You will also need an ssh client on the server and an ssh server running on the client. I know that sounds arse-backwards, but that’s just how things work sometimes. Any distribution worth its salt will come with both the ssh client and server. The ssh client should be installed by default. The ssh server may not be installed by default on the client. If not, install it through whatever package manager you have on your distribution. You will then need to get it running. Hopefully the package manager will have done this for you. If not, then you will need to set the sshd daemon to run on startup in the default runlevel. Some distributions have graphical programmes to configure startup scripts (technically referred to as init scripts), and some don’t. Setting init scripts to run is beyond the scope of this howto. Google for it.

Now we’ve got ssh installed and running where it should be, we need to set up passwordless ssh. Passwordless ssh is required, so that the automatically scheduled backup can run when you’re away from the computer. There is a howto on setting up passwordless ssh here.

a bit about rdiff-backup

rdiff-backup can be run to back up local and remote directories. Here are some basic commands that show how it works:

 rdiff-backup foo bar

will backup local directory foo to local directory barbar will end up a copy of foo, except it will contain the directory foo/rdiff-backup-data, which will allow rdiff-backup to restore previous states.

rdiff-backup /local/dir hostname.net::/remote/dir

will backup directory /local/dir to the directory /remote/dir on the machine hostname.net. It uses ssh to open the necessary pipe to the copy of rdiff-backup on hostname.net (the client). Just like the above except one directory is on the client.

rdiff-backup -v5 --print-statistics user1@host1::/source-dir
user2@host2::/dest-dir

will backup source-dir from one remote computer to dest-dir on another remote computer. The -v5 switch has been added for greater verbosity (verbosity settings go from 0 to 9, with 3 as the default), and the –print-statistics switch displays some statistics at the end (even without this switch, the statistics will still be saved in the rdiff-backup-data directory).

the script

I have made a script to do my rdiff-backups. This is mostly to keep my crontab entries tidy, and also to make changing the backup behaviour a bit easier.

The first thing you need to think about is what do you want to backup on the client machine. This will be entirely up to you. I backup personal stuff that is irreplaceable like photos and videos of the kids. I also backup email, and documents. I even backup my .kde directory, just in case I do something stupid with it. But it’s entirely up to you what you backup. You set out each directory that you want to backup in the script. Please bear in mind that the script will backup everything in that directory, so be mindful of this if you’ve got sub-directories that you don’t want to backup.

The script looks like this:

#!/bin/bash
cd <strong>/destination/directory</strong>
echo "Backing up Pictures"
echo ""
rdiff-backup --print-statistics
<strong>remote_user</strong>@<strong>192.168.7.250</strong>::<strong>/home/
remote_user/Pictures </strong> Pictures
echo "Backing up Mail"
echo ""
rdiff-backup --print-statistics
<strong>remote_user</strong>@<strong>192.168.7.250</strong>::<strong>/home/
remote_user/Mail</strong> Mail
echo ""
echo "**********rdiff-backup completed************"
echo ""
echo "Disk usage:"
echo ""
df /dev/hdb1

You will obviously need to replace the entries (in bold) with your own details. You can also repeat the section:

echo "Backing up Pictures"
echo ""
rdiff-backup --print-statistics
<strong>remote_user</strong>@<strong>192.168.7.250</strong>::<strong>/home/
remote_user/Pictures </strong> Pictures

multiple times for each directory you want to back up. Cut and paste the above code into a text editor (my favourite is kate) and save it somewhere on the server as rdiff-backup.sh. Then change the permissions, so that it’s executable. Open up a terminal, and type:

chmod +x /path/to/script/rdiff-backup.sh

You should then probably make sure that it runs.

scheduling the script to run regularly

For this bit, you need to run what’s called a cron job so that the script runs automatically at regular intervals. Cron is a programme that runs in the background, and runs scheduled tasks. The easiest way to set up a cron job is to edit the crontab file. You can do this by opening a terminal and typing:

crontab -e

This should open the crontab file for the current user ready for editing. Each cron job sits on a single line. The line starts with 5 numbers separated by a space. Each number represents a time or day. To run the backup script nightly at 4am, add this line to your crontab file:

0 4 * * * /full/path/to/rdiff-backup.sh

That’s it. To learn howto restore files from the backup, have a look here.