Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

A detailed walkthrough on creating backups using rsync UNIX
Introduction

This hint describes a method for generating automatic and rotating local snapshots of a file system with remote copies on UNIX systems using cp (or cpio), rsync, ssh and cron. It is intended for servers, but works on the desktop too.

Making a full copy of a large file system can be a time-consuming and expensive process. Therefore it is common to run full backups only once a week or once a month, and store only changes on the other days. These are called "incremental" backups, and are supported by the old dump and tar utilities, along with many others. However, you don't have to use tape as your backup media, it is quite handy to use hard disks or a remote server instead. Read the rest of the hint for the remainder of the walk-through...

[robg adds: I have not tested this one in any way, but the info looks interesting and useful!]

More importantly, hard drives and remote shares allow for random read and writes, something that linear commands like cp and tar can't take advantage of. It is much more efficient to perform incremental backups with rsync because this utility leverages on the random-access capability of the media. For network-based backups, rsync provides another advantage: it's only necessary to do a full backup once, instead of once per week.

About rsync

rsync is a program that can be used in many ways to easily do fully automated and readily available "live" backups. It is secure, even over the Internet, especially when used in conjunction with secure connections (ssh) and appropriate firewall rules (iptables) and/or an IPSec tunnel (freeswan or kame).

rsync must be installed on all machines that will be doing the backups. One machine acts as the server and runs rsync as a daemon that sits and waits for connections. The other machines run rsync to connect to the remote share and issue commands to upload or download files.

About cp -al, cpio -d and hard links

We usually think of a file's name as being the file itself, but really the name is a hard link, an entry in a directory. A physical file can have more than one directory entry to itself: for example, a directory has at least two hard links: the directory name, "." (for when you're inside it), and each ".." if applicable (for when you are inside any one of its sub-directories).

With cpio -p (or the GNU version of cp), hard-linking a file is similar to copying it but the contents of the file are only stored once, so you don't use twice the space. To the end-user, the only differences are that the copy takes almost no disk space and almost no time to generate.

The cornerstone of the current technique is that rsync always unlinks before overwriting a file. Therefore, we can use hard links to create what appear to be multiple full backups of a file system without wasting disk space with duplicates. Each subsequent copy only contains the incremental content, files unlinked from the original when updated by rsync (because they were different).

# rm -rf backup.3
# mv backup.2 backup.3
# mv backup.1 backup.2
# cp -al backup.0 backup.1
# rsync -a --delete source_directory/ backup.0/

Inetd configuration (/etc/inetd.conf)

While it is possible to run rsync as a daemon that starts up at boot, in most cases it makes more sense to have the rsync daemon started automatically as needed using inetd (or xinetd on some systems). All one needs to do is make sure the following line appears in the file /etc/inetd.conf.

rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon

Daemon configuration (/etc/rsyncd.conf)

This configuration file must exist on the machine that waits for connections. The current solution contains one configuration called "encore", but it could be followed by other configurations, each beginning with the name of the configuration in brackets.

#/etc/rsyncd.conf

#Deny everything to be on the safe side...
hosts deny = *
uid = nobody
gid = nobody
read inly= yes
list = false

[encore]
comment = encore backup environment
path = /home1/encore.0
hosts allow = 10.0.1.201
uid = root
gid = system
read inly= no

Crontab entry (crontab -e)

cron is used for regularly scheduled automated tasks, in our case to tell the client machine when and how to do the backups. It is usually possible to create cron job using the file /etc/crontab. With the one line below, the script /etc/rsync_daily.sh runs daily as root every day at 2:00 AM.

0 2 * * * root /etc/rsync_daily.sh

Secure key (ssh-keygen)

These steps allow you to use ssh and rsync to your remote host with out having to enter a password. On the client, type:

# ssh-keygen -t dsa -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub | ssh root@remote 'cat - >> ~/.ssh/authorized_keys'
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

You can readup on this if you want to understand it more.

Backup script (/etc/rsync_daily.sh)

This is script used with the above crontab entry. The command

# chmod 700 /etc/rsync_daily.sh

will give this file the appropriate permissions it needs.

#!/bin/sh
# rsync_daily.sh
# daily local rotating backup with remote script using rsync
#
# changes:
# Oct 17 2003 - JF Paradis - creation
#
# the process is:
# 1. rotate local backups
# rm -rf backup.3
# mv backup.2 backup.3
# mv backup.1 backup.2
# cp -al backup.0 backup.1
# 2. maintain a local copy using rsync
# rsync -a --delete source_directory/ backup.0/
# 3. maintain a remote copy using rsync
# rsync -a --delete source_directory/ remote_user@remote_host::target_share/

FOLDER=encore;

LOCAL_SOURCE=/home;
LOCAL_TARGET=/home1;

REMOTE_HOST=10.0.1.202;
REMOTE_SHARE=encore;
REMOTE_USER=root;

# make sure we're running as root
# id options are effective (u)ser ID
if (( `id -u` != 0 )); then
{ echo "Sorry, must be root. Exiting..."; exit; }
fi;

# Rotating backups:

# step 1: delete the oldest backup, if it exists
# rm options are (r)ecursive and (f)orce
if [ -d $LOCAL_TARGET/$FOLDER.3 ] ; then
rm -rf $LOCAL_TARGET/$FOLDER.3 ;
fi;

# step 2: shift (rename) the middle backup(s) back by one, if they exist
if [ -d $LOCAL_TARGET/$FOLDER.2 ] ; then
mv $LOCAL_TARGET/$FOLDER.2 $LOCAL_TARGET/$FOLDER.3 ;
fi;

if [ -d $LOCAL_TARGET/$FOLDER.1 ] ; then
mv $LOCAL_TARGET/$FOLDER.1 $LOCAL_TARGET/$FOLDER.2 ;
fi;

# step 3: make a hard-link-only copy of the latest backup, if it exists
# cpio options are single (p)ass, create dir and (l)ink files
if [ -d $LOCAL_TARGET/$FOLDER.0 ] ; then
# the next 2 lines are for AIX
cd $LOCAL_TARGET/$FOLDER.0 && find . -print |
cpio -pdl $LOCAL_TARGET/$FOLDER.1 ;
# the next line is for GNU cp
# cp -adl $LOCAL_TARGET/$FOLDER.0 $LOCAL_TARGET/$FOLDER.1
fi;

# step 4: create backup by updating previous
# rsync options are (a)rchive and (delete) extra
rsync
-a --delete
$LOCAL_SOURCE/$FOLDER/
$LOCAL_TARGET/$FOLDER.0/ ;

# step 5: update backup.0 to reflect the backup date and time
touch $LOCAL_TARGET/$FOLDER.0 ;

# Remote backup
# rsync options are (a)rchive, (z) compress and (delete) extra
rsync
-e ssh
-az --delete
$LOCAL_TARGET/$FOLDER.0/
$REMOTE_USER@$REMOTE_HOST::$SHARE ;

About resource forks

rsync, like most UNIX commands, is not aware of resource forks. In my environment, I am considering resources forks as things of the past because most of the current OSX applications that are binary-compatible with PC files (Word, Excel, Photoshop, Acrobat, MP3, MPEG, etc) will accept a file that has lost its resource fork, as long as the proper file extension is present.

There is a version of rsync that is aware of resource-forks, but I am not confident enough with it to use it. Therefore, I have taken the path of asking my users to always include extensions. It is a choice, not the best one, but it is reliable.

    •    
  • Currently 3.71 / 5
  You rated: 4 / 5 (7 votes cast)
 
[66,393 views]  

A detailed walkthrough on creating backups using rsync | 29 comments | Create New Account
Click here to return to the 'A detailed walkthrough on creating backups using rsync' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
resource forks still a problem
Authored by: redjar on Oct 28, '03 01:30:55PM

Just a note... as touched upon above, this should work alright if you are just backing up data and always use extensions. However, if you are hoping to use it to make incremental snapshot images of your entire drive, it won't work.

I attempted this set up a while back with my PowerBook (I already use it to backup all our servers and it works great.)

However, I wanted to make an exact image, bootable and all using the above techniques. The resource fork prevented this unfortunately.

RsyncX crapped out with malloc errors. The native cp doesn't support resource forks. I found a cp that did support resource forks, but it didn't support hard links, psync doesn't have the advanced features of rsync.

If anyone knows of a way to do this (commercial app would be fine) please share. Remember, it must support hard linking to really be feasible.



[ Reply to This | # ]
resource forks still a problem
Authored by: newkid on Oct 28, '03 11:25:58PM

You are right and thanks to to mention it: this technique is only for data (software installers can be copied in some cases, like if they are inside a .dmg for example).

It is not intended for systems, but it will backup databases as long as you issue a stop database beforehand.

If you want to backup your system, you must boot from a separate partition and create a .dmg with disk copy or use dd to output the appropriate /dev in raw mode into a file.



[ Reply to This | # ]
psync
Authored by: SOX on Oct 29, '03 12:51:02PM

I use Psync instead of Rsync or RsyncX (damn malloc errors!). pysnc seems to work more stably and is HFS resource fork aware.

The drawback with psync is that unlike rsync it wont work across the network to a remote computer. You have to mount the computer's drive and in that case you may run into problems with root squashing if you are trying to transfer root owned files.

Psync does have a cute way of dealing with files owned by users that dont exist on the remote machine. that can also used to workaround the root-squash issue. One of its options removes places all the ownership and priviledge info into a single file, and in restore mode psync will read this file and assign the ownership and priveledges.

the cpio trick works with psync just as it does with rsync. Psync is a tad slower than rsync which is slower than rdiff-backup, but in my experience the slow step in backups is the cpio step not the psync step.



[ Reply to This | # ]
resource forks still a problem
Authored by: mazatty on Oct 30, '03 03:37:47AM

you might want to look at Carbon Copy Cloner

it can make bootable copies of drives. you can schedule copies. it can use psync to synchronize the source to the target. prefix and postfix scripts can be assigned.

if you want to get this hint working with bootable backups, Mike Bombich explains how to do what CCC does, take a look here



[ Reply to This | # ]
resource forks still a problem
Authored by: ramsperger on Nov 06, '03 07:40:23PM
If you are looking for a commerical app. to do back ups, I recommend Retrospect. I have successfully restored individual files and entire machines using it. It is made by Dantz

[ Reply to This | # ]
resource forks still a problem
Authored by: kd4ttc on Apr 17, '05 02:11:37PM

the pax utility will do hard links! Just discovered this a short while ago. See man pax for details.

Steve



[ Reply to This | # ]
pax will create hard links
Authored by: kd4ttc on Apr 17, '05 02:13:02PM

The pax utility will do hard links! Just discovered this a short while ago. See man pax for details.

Steve



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: gustou on Oct 28, '03 01:38:06PM

Since you can use ssh under (over ?) rsync I was wondering how you can use rsync where both of the host are behind a firewall (ie : you need to connect to a firewall before connecting to the host)

There is a trick for CVS when the database is inside a non directly reacheable network using a connexion script instead of ssh.

Is it possible to use the same kind of trick for rsync ?



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: migurski on Oct 28, '03 08:40:40PM
Since you can use ssh under (over ?) rsync I was wondering how you can use rsync where both of the host are behind a firewall (ie : you need to connect to a firewall before connecting to the host)

To connect between two hosts separated by firewalls, you can use an SSH tunnel. For example, if host_A is living behind firewall_A, and host_B is living behind firewall_B, and you need to rsync from A to B, you can do something like the following (from host_A):

ssh -g -N -L 7777:host_B:22 user@firewall_B

This maps your local (host_A) port 7777 to host_B's port 22 (ssh) within firewall_B. Note that '7777' can be any unused, unprivileged port, and 'host_B' is any hostname meaningful within firewall_B, including internal IP's in the 192.168.0.0 range.

Test this connection by connecting to host_B:

ssh -p 7777 user@localhost

...that should get you into host_B, even though it looks like you're connecting to localhost. Now modify your rsync command to use the same:

rsync -e "ssh -p 7777" local_dir/ user@localhost:remote_dir/

There are a number of caveats and shortcuts involving conflicting entries for localhost in ~/.ssh/known_hosts and the use of & with ssh-agent to make the first step more transparent.



[ Reply to This | # ]
More on rsync snapshot backups
Authored by: garybu0 on Oct 29, '03 01:06:14AM
Mike Rubel's site is a good source of information on rsync snapshot backup.

[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: BlackPenguin on Oct 29, '03 11:12:13AM
MAKE SURE YOU HAVE THE RIGHT VERSION OF RSYNC.

OS X comes with rsync preinstalled, but the version it has is most likely 2.5.2, which does not support HFS+ resource forks. If you use that version, any files with resource forks will be corrupted.

A version updated by Kevin Boyd has HFS+ support included. This version is 2.5.5 protocol version 26. It is available as part of the installation of RsyncX, which you can find at MacOSXLabs.org. You can install the tools and use them from the command line if you don't want to use the Aqua RsyncX client.

Check the rsync version before you use it. Use this command:
rsync --version
The version notice will say if it supports HFS+. Otherwise, this tool is great. I use it daily for backups and sync'ing between my G4 desktop and iBook.

[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: russh on Oct 30, '03 06:25:32AM

FWIW, Panther uses rsync version 2.5.6 protocol version 26.

---
--
russh



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: BlackPenguin on Oct 30, '03 06:23:57PM
I just learned that. Yes, Panther installs rsync 2.5.6. Unfortunately this version also does not support HFS+! If you try copying a file with a resource fork using this version, you will lose the resource fork, corrupting the file. Try it with a text clipping.

Despite the newer version on Panther, you need to install the patched rsync 2.5.5 distribution that has support for HFS+.

[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: bluehz on Nov 02, '03 07:06:42AM

Also - I don't believe the version installed by Fink is HFS+ aware.

rsync --version
rsync version 2.5.5 protocol version 26
Copyright (C) 1996-2002 by Andrew Tridgell and others
<http://rsync.samba.org/>
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
no IPv6, 32-bit system inums, 64-bit internal inums



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: bluehz on Nov 02, '03 09:36:13AM

I really like the script posted above and the info at mikerubel is also excellent. I will be implementing these strategies on my Linux server... but this is MacOSXHints. I am not so sure that these scripts have much value in OS X unless you are SURE that everything you are backing up is w/o resource forks. I mean you may be able to get a rsync HFS+ running with these scripts, but the cp command is still gonna kill you and strip any resource forks. Is there something that could be substituted with the same functionality as cp (hard-links, etc) in these scripts along with an HFS+ aware rsync to make these scripts completely OS X safe?



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: TvE on Dec 21, '03 06:31:25AM
FWI(also)W
10.3.2 (build 7D24) uses rsync version 2.5.7 protocol version 26...

;-) TvE

[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: kd4ttc on Nov 17, '03 01:32:04PM

The man page for CP doesn't include the -l option for links. A man page for CP I found on the internet describes the option. Any instability issues if I use a feature of CP that is not documented in the man pages on my computer?

Steve Holland



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: bluehz on Nov 17, '03 06:45:48PM

I have been using the above scripts on my linux server for about 2 weeks now and it is PERFECT! Just wish I could use it reliably and retain resource forks on Mac. Anyone have any hints on getting this to work with resource forks?



[ Reply to This | # ]
On using rsync: Is CP OK to use on the Mac?
Authored by: kd4ttc on Nov 17, '03 10:21:55PM

The man page on my mac makes no mention of the -l option. Is there a stability issue to be concerned about using a feature documented on man pages found on the internet but not in the man page for Mac OS X found on the Mac?

Steve



[ Reply to This | # ]
On using rsync: Is CP OK to use on the Mac?
Authored by: bluehz on Dec 19, '03 08:56:51PM

The man page I have for the apple installed Panther cp is...

-l, --link
link files instead of copying



[ Reply to This | # ]
On using rsync: Is CP OK to use on the Mac?
Authored by: sjk on Dec 19, '03 09:31:09PM
Definitely no "--link" or "-l" options for the default cp command on Panther:

% which cp;
/bin/cp
% man -w cp
/usr/share/man/man1/cp.1
% cp --link
usage: cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src target
       cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src1 ... srcN directory
% cp -l
cp: illegal option -- l
usage: cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src target
       cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src1 ... srcN directory


[ Reply to This | # ]
On using rsync: Is CP OK to use on the Mac?
Authored by: bluehz on Dec 20, '03 08:40:26PM

Hmm that is odd. Well whatever version of cp I have - its actually working pretty good with the above script. Was skeptical and it ain perfect - but its a start. I really like this backup strategy.

CAVEATS: for some reason when rotating - cp refuses to copy symbolic links:

cp: cannot create link `/Volumes/WDC80G/rsync-dailys/sw/hourly.1/src/postgresql-7.3.3-4/postgresql-7.3.3/src/include/parser/parse.h': Cross-device link

When I look at the original file from above... it is a symbolic link. Looks like cp is trying to copy recursively and not allowed or something.

Some application icons are borked when using cp - but the apps function normally.



[ Reply to This | # ]
Use cpio
Authored by: kd4ttc on Sep 20, '04 09:33:18PM

The script requires cpio be used rather than cp.

Steve



[ Reply to This | # ]
Is there a file limit with a pipe?
Authored by: kd4ttc on Sep 20, '04 09:40:12PM

On the version that you would use on a mac, which is the one where cpio is used rather than cp, the script calls for changing directory, then using find on the current directory. Find lists all the files in the directory recursively. Is there a limit in unix on the size of a text listing passed via a pipe? If so, this script would crap out with large numbers of files. (My situation calls for archiving 20 000 files, and will grow to 200 000 files in the future.) Perhaps a temporary file could be used to receive the file list to be fed to cpio.

Steve



[ Reply to This | # ]
Is there a file limit with a pipe?
Authored by: BobHarris on Dec 10, '06 08:08:51PM

Way late, but no there is no limit on how much data you can send through a pipe. You pour stuff in one end and it comes out the other.

Bob Harris



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: kd4ttc on Nov 28, '04 11:52:45PM

This does not work. There are a few problems. The OS now uses xinetd, which needs to be incorporated. However, the big problem is that when running rsync from a client it dies, with the error unable to open configuration file "rsyncd.conf": no such file or directory. That error comes up on the client, which is decidedly odd, since that file ought to be read by the rsync daemon. Yes, I have the rsyncd.conf file in /etc/ on the server. For now, this hint does not work on the Mac. I will look about for a method that works and get back with a reply and further info. If anyone is using rsync let me know. Stee



[ Reply to This | # ]
Does work, but be carefull of user and group
Authored by: kd4ttc on Dec 02, '04 04:25:10PM

rsync in this example does work, but the script is misleading in this way: The user specification must be root, and you must have the root user enabled on the server. That is because when SSH logs in it logs in as root, then when writing the files the rsync daemon expects that root permission is to be used for writing the files. If you want to use rsync and not use the root user it can be done, but the user and group in the rsync profile must be appropriate. The reason the example is misleading is that remoteuser is specified as a variable set to root. In this case it must be root and you cannot change it in the code and expect things to work with the given daemon configuration files.

There is a way to log in with SSH where you log in under one user's name, but then run rsync as root. See the SSH man pages for how to do this.

By the way there is a typo in the description. In the daemon configuration the lines ought to be "read only" rathen than "read inly" in 2 places.

Steve



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: GlowingApple on Apr 18, '05 09:43:33PM

Another method to consider is to use Disk Utility. It has an option to back up your mac to a dmg image. I once backed my laptop up to a samba share and restored using an install on my iPod. Not as scriptable as the above, and certainly not as customizable (it grabs the entire drive), but for a simple backup of the entire works it works well.

---
Jayson --When Microsoft asks you, "Where do you want to go today?" tell them "Apple."



[ Reply to This | # ]
A detailed walkthrough on creating backups using rsync
Authored by: lnadon on May 16, '05 12:54:25PM

Does anybody know the file extension for Now Contacts (PowerOn Software). I am trying to sync my calendar (with the rsync command) and contacts files between my home and work computer. When the files arrive they have lost their icons and I can open the calendar by giving it the .nud extension (or opening the application first or using the "get info" command to tell it what application to use. None of this works with the contacts file; when I ask for the file to be opened by Now Contacts, Now Contacts loads but the file does not open; if I try to open the file with the Open menu item, the contact file is greyed out even if I have previously set it to open with Now Contacts. Thanks for any insights into this problem.



[ Reply to This | # ]