Oct 28, '03 10:38:00AM • Contributed by: newkid
This hint describes a method for generating automatic and rotating local snapshots of a file system with remote copies on UNIX systems using cp (or cpio), rsync, ssh and cron. It is intended for servers, but works on the desktop too.
Making a full copy of a large file system can be a time-consuming and expensive process. Therefore it is common to run full backups only once a week or once a month, and store only changes on the other days. These are called "incremental" backups, and are supported by the old dump and tar utilities, along with many others. However, you don't have to use tape as your backup media, it is quite handy to use hard disks or a remote server instead. Read the rest of the hint for the remainder of the walk-through...
[robg adds: I have not tested this one in any way, but the info looks interesting and useful!]
More importantly, hard drives and remote shares allow for random read and writes, something that linear commands like cp and tar can't take advantage of. It is much more efficient to perform incremental backups with rsync because this utility leverages on the random-access capability of the media. For network-based backups, rsync provides another advantage: it's only necessary to do a full backup once, instead of once per week.
About rsyncrsync is a program that can be used in many ways to easily do fully automated and readily available "live" backups. It is secure, even over the Internet, especially when used in conjunction with secure connections (ssh) and appropriate firewall rules (iptables) and/or an IPSec tunnel (freeswan or kame).
rsync must be installed on all machines that will be doing the backups. One machine acts as the server and runs rsync as a daemon that sits and waits for connections. The other machines run rsync to connect to the remote share and issue commands to upload or download files.
About cp -al, cpio -d and hard linksWe usually think of a file's name as being the file itself, but really the name is a hard link, an entry in a directory. A physical file can have more than one directory entry to itself: for example, a directory has at least two hard links: the directory name, "." (for when you're inside it), and each ".." if applicable (for when you are inside any one of its sub-directories).
With cpio -p (or the GNU version of cp), hard-linking a file is similar to copying it but the contents of the file are only stored once, so you don't use twice the space. To the end-user, the only differences are that the copy takes almost no disk space and almost no time to generate.
The cornerstone of the current technique is that rsync always unlinks before overwriting a file. Therefore, we can use hard links to create what appear to be multiple full backups of a file system without wasting disk space with duplicates. Each subsequent copy only contains the incremental content, files unlinked from the original when updated by rsync (because they were different).
# rm -rf backup.3
# mv backup.2 backup.3
# mv backup.1 backup.2
# cp -al backup.0 backup.1
# rsync -a --delete source_directory/ backup.0/
Inetd configuration (/etc/inetd.conf)
While it is possible to run rsync as a daemon that starts up at boot, in most cases it makes more sense to have the rsync daemon started automatically as needed using inetd (or xinetd on some systems). All one needs to do is make sure the following line appears in the file /etc/inetd.conf.
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
Daemon configuration (/etc/rsyncd.conf)
This configuration file must exist on the machine that waits for connections. The current solution contains one configuration called "encore", but it could be followed by other configurations, each beginning with the name of the configuration in brackets.
#/etc/rsyncd.conf
#Deny everything to be on the safe side...
hosts deny = *
uid = nobody
gid = nobody
read inly= yes
list = false
[encore]
comment = encore backup environment
path = /home1/encore.0
hosts allow = 10.0.1.201
uid = root
gid = system
read inly= no
Crontab entry (crontab -e)
cron is used for regularly scheduled automated tasks, in our case to tell the client machine when and how to do the backups. It is usually possible to create cron job using the file /etc/crontab. With the one line below, the script /etc/rsync_daily.sh runs daily as root every day at 2:00 AM.
0 2 * * * root /etc/rsync_daily.sh
Secure key (ssh-keygen)
These steps allow you to use ssh and rsync to your remote host with out having to enter a password. On the client, type:
# ssh-keygen -t dsa -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub | ssh root@remote 'cat - >> ~/.ssh/authorized_keys'
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
You can readup on this if you want to understand it more.
Backup script (/etc/rsync_daily.sh)This is script used with the above crontab entry. The command
# chmod 700 /etc/rsync_daily.sh
will give this file the appropriate permissions it needs.
#!/bin/sh
# rsync_daily.sh
# daily local rotating backup with remote script using rsync
#
# changes:
# Oct 17 2003 - JF Paradis - creation
#
# the process is:
# 1. rotate local backups
# rm -rf backup.3
# mv backup.2 backup.3
# mv backup.1 backup.2
# cp -al backup.0 backup.1
# 2. maintain a local copy using rsync
# rsync -a --delete source_directory/ backup.0/
# 3. maintain a remote copy using rsync
# rsync -a --delete source_directory/ remote_user@remote_host::target_share/
FOLDER=encore;
LOCAL_SOURCE=/home;
LOCAL_TARGET=/home1;
REMOTE_HOST=10.0.1.202;
REMOTE_SHARE=encore;
REMOTE_USER=root;
# make sure we're running as root
# id options are effective (u)ser ID
if (( `id -u` != 0 )); then
{ echo "Sorry, must be root. Exiting..."; exit; }
fi;
# Rotating backups:
# step 1: delete the oldest backup, if it exists
# rm options are (r)ecursive and (f)orce
if [ -d $LOCAL_TARGET/$FOLDER.3 ] ; then
rm -rf $LOCAL_TARGET/$FOLDER.3 ;
fi;
# step 2: shift (rename) the middle backup(s) back by one, if they exist
if [ -d $LOCAL_TARGET/$FOLDER.2 ] ; then
mv $LOCAL_TARGET/$FOLDER.2 $LOCAL_TARGET/$FOLDER.3 ;
fi;
if [ -d $LOCAL_TARGET/$FOLDER.1 ] ; then
mv $LOCAL_TARGET/$FOLDER.1 $LOCAL_TARGET/$FOLDER.2 ;
fi;
# step 3: make a hard-link-only copy of the latest backup, if it exists
# cpio options are single (p)ass, create dir and (l)ink files
if [ -d $LOCAL_TARGET/$FOLDER.0 ] ; then
# the next 2 lines are for AIX
cd $LOCAL_TARGET/$FOLDER.0 && find . -print |
cpio -pdl $LOCAL_TARGET/$FOLDER.1 ;
# the next line is for GNU cp
# cp -adl $LOCAL_TARGET/$FOLDER.0 $LOCAL_TARGET/$FOLDER.1
fi;
# step 4: create backup by updating previous
# rsync options are (a)rchive and (delete) extra
rsync
-a --delete
$LOCAL_SOURCE/$FOLDER/
$LOCAL_TARGET/$FOLDER.0/ ;
# step 5: update backup.0 to reflect the backup date and time
touch $LOCAL_TARGET/$FOLDER.0 ;
# Remote backup
# rsync options are (a)rchive, (z) compress and (delete) extra
rsync
-e ssh
-az --delete
$LOCAL_TARGET/$FOLDER.0/
$REMOTE_USER@$REMOTE_HOST::$SHARE ;
About resource forks
rsync, like most UNIX commands, is not aware of resource forks. In my environment, I am considering resources forks as things of the past because most of the current OSX applications that are binary-compatible with PC files (Word, Excel, Photoshop, Acrobat, MP3, MPEG, etc) will accept a file that has lost its resource fork, as long as the proper file extension is present.
There is a version of rsync that is aware of resource-forks, but I am not confident enough with it to use it. Therefore, I have taken the path of asking my users to always include extensions. It is a choice, not the best one, but it is reliable.
