Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Simple versioned backups with RsyncX Apps
I recently bought a 300gb external hard drive for backup and storage purposes, and I was looking for a tool for versioned backups that doesn't use too much space, using hardlinks, preferably. In short: RsyncX (some documentation) is just what I was looking for, but there are some glitches and stuff you should know to make it work as it should.

Read on for an explanation of what versioned backups and hardlinks are, why they are good, and what you have to do to make RsyncX do what it should.

First, let's turn to versioned backups: This means that you won't have only one backup of your hard drive, but several ones -- the newest one, plus older versions, too, in order to be able to find files you have deleted some time ago. They won't be present in your latest backup, as they were already deleted at that time, but you can still retrieve them from older backups. If every version of your backup used the complete space of all the files on your internal (or main) hard drive, however, you'd soon have massive amounts of space wasted on your backup drive.

Enter hardlinks. Hardlinks are a very clever concept that unix file systems use. (NTFS is able to use them, too, though Windows doesn't come with any tool to actually use that feature.) In short: You may have several directory entries (in the same or different directories, with different names, if you want) pointing to the same actual data on your drive. For example, you have a the files /foo/bar/one and /bar/baz/two that really are the same file. Both directory entries point to the same data on the drive.

And now the really clever part: The file system knows how many directory entries point to an inode, that is (simply put) the actual data of the file. If you delete /foo/bar/one, /bar/baz/two will still exist and the inode won't be deleted. Only if you also delete /bar/baz/two will the space on the drive be freed, and the data deleted.

Now that's perfect for versioned backups! Suppose you want three backup versions on your backup drive, residing in three directories called backup1, backup2, and backup3. When you back up for the first time, all files from your main drive will be copied to the backup drive, into directory backup1. When you back up for the second time, backup1 will be renamed to backup2. All files in the newly-created backup1 that did not change will just be hardlinks to files already present in backup2 -- and thus won't use any additional space. Both backup1 and backup2 are fully usable copies of your main drive, though. This means you could just copy them back to your main drive wihout the hassle of incremental backups, where you have to use a full backup and then commit back all the changes saved to the incremental backups since then.

Suppose you back up for the fourth time. backup3 will be deleted -- which means that only the data that was different from backup2 will be _actually_ deleted, the rest is still there, as directory entries from backup2 (and maybe backup1) still point to the data. Now backup2 is renamed to backup3, backup1 is renamed to backup2, and the new backup can be copied/hardlinked to your backup drive.

This is a copy, of course, so the files are not compressed. It's not a very good idea to compress backups, though, as you normally want the best data safety you can get. In compressed files, one wrong bit may render the whole file unusable, which is even worse if your whole backup consists of only one file.

I had this idea of versioned backups with hardlinks, and I was looking for a tool that used this technique -- which rsync does. rsync is a unix tool that can mirror directories, or whole file systems, to other directories, other file systems, and even over the network. The rsync that comes with Mac OS X isn't HFS+-aware, though -- it won't copy resource forks. And it requires a good amount of manpage reading and command line fiddling to have it do what you want to.

Enter RsyncX. First, it provides you with an rsync version that will copy resource forks. Second, it comes with a neat graphical UI where you just have to point and click to tell it what you want, and it will produce a shell script that does just that. It can even set up a cron job for you that will execute your backup script every night, or whenever you want to. There are some things you should know, though, to make it work as it should.

First, hardlinks for back up purposes will only work if the user and file permissions are not ignored on your backup drive. Simple reason: If they are ignored, your script (run as root by crond) will see all backed up files as belonging to root:wheel. The files on your main drive have different owners and permissions, though, so there IS a difference between the file from the last backup and the file on your main drive. This means that rsync will copy, not hardlink, the file. So you will end up with a full copy of the file, instead of a hardlink, even if the file hasn't changed.

So before you run your backup script for the first time, be sure control-click on the icon of your backup drive, select Get Info, and then uncheck the box next to "Ignore owners on this volume," near the bottom of the window.

Second, at least on Tiger, RsyncX will save your backup script as TheNameYouChose.scpt, but set up a cron job with only /Path/To/TheNameYouChose. You will have to rename TheNameYouChose.scpt to TheNameYouChose for the cron job to work. I submitted this bug to the developers, but never got an answer -- maybe they silently corrected it meanwhile, maybe not.

That's it.

You now have a perfect versioned backup solution, it's free, and it works just great. Personally, I have the cron job run every night, and I have seven versions of my main drive in my backups. Which makes me feel much safer than before.
    •    
  • Currently 4.00 / 5
  You rated: 5 / 5 (4 votes cast)
 
[41,790 views]  

Simple versioned backups with RsyncX | 14 comments | Create New Account
Click here to return to the 'Simple versioned backups with RsyncX' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Simple versioned backups with RsyncX
Authored by: placain on May 24, '06 08:17:10AM
In his excitement, the poster didn't get around to actually telling anyone how to do this.

Some background reading:

Simple versioned backups with RsyncX
Authored by: syko on May 24, '06 08:25:09AM

But, is it compatible with 10.4?

I use rsyncx on several 10.3 machines, but had read that it needed to be updated for 10.4.

Can anyone confirm/deny this?

Resource fork stuff is very important still.



[ Reply to This | # ]
Simple versioned backups with RsyncX
Authored by: ocdinsomniac on May 24, '06 08:57:06AM

So far, rsynx has worked just fine for me in 10.4. In fact, it works better than the built-in version that ships with Tiger, which did not work well for me at all.

-systemsboy



[ Reply to This | # ]
Simple versioned backups with RsyncX
Authored by: rflo on May 24, '06 08:40:49AM

The author writes: "The rsync that comes with Mac OS X isn't HFS+-aware, though -- it won't copy resource forks." The rsync shipped with MacOS 10.4 is resource-aware: see the man page, especially the -E option.

Also, this hard-link scheme depends on having the backups and the original on the same file-system (hard-links do not work across file-systems), which is not a great idea for back-ups. A proper incremental backup tool, even the tar supplied with MacOS, would enable backups based on changes since the last backup, without depending on a dubious use of hard-links.

---
Ronald Florence



[ Reply to This | # ]
Simple versioned backups with RsyncX
Authored by: Syco on May 24, '06 09:18:51AM

I had the same reaction you did, but I think what he wants is something like this:

Main Drive:
/foo
/bar
/baz

Let's say before backup2, he changes foo.

Backup Drive:
/backup1
/backup1/foo
/backup1/bar
/backup1/baz
/backup2
/backup2/foo
/backup2/bar <--- this is hardlinked to /backup1/bar, because it's the same file, and this preserves the entire backup structure
/backup2/baz <--- same as above



[ Reply to This | # ]
Even simpler: use rsnapshot
Authored by: dbs on May 24, '06 08:52:47AM

There is a script written to do exactly this. It's called "rsnapshot". You simply tell it where your source is and where the destination is and how many of each type of snapshot to keep (i.e., 4 hourly, 7 daily, 4 weekly, whatever) and then execute "rsnapshot hourly" or "rsnapshot daily" and it takes care of rotating backups, copying with links, and rsyncing. Normally you add cron jobs to do the appropriate rsnapshots automatically, but it works fine to execute it manually.

I use it rsnapshot with rsync-hfs to an external firewire drive to keep snapshots of my powerbook and my wife's. It will not create bootable backups, though.

I also use rsnapshot remotely to snapshot my cluster's file system. You do have to have all the backups stored on one local file system, but since it uses rsync you can grab data from remote file systems.



[ Reply to This | # ]
just use psync
Authored by: jt777 on May 24, '06 01:08:44PM

I gave up on rsync a while ago. Besides the resource fork issue, it was just flakey at the time. Then I found psync. There is a GUI client made by someone called PsyncX which uses the command line psync. Psync is perfect. Just do: sudo psync source destination.

There are more options. Check out the GUI, do a man psync, etc. Just grab PsyncX, which will also install psync, and go from there.
http://psyncx.sourceforge.net/



[ Reply to This | # ]
just use psync -- nope
Authored by: sjk on May 24, '06 03:20:56PM

No, psync isn't perfect and Carbon Copy Cloner inherits its weaknesses when it uses it. See "The State of Backup and Cloning Tools under Mac OS X" article germ linked to for more information.



[ Reply to This | # ]
But.....
Authored by: germ on May 24, '06 03:04:20PM
Please see this detailed discussion of backup tools on Mac OS X. The conclusion is that SuperDuper and ASR are the only reliable backup tools. Do you agree? If not, why?

[ Reply to This | # ]
But.....
Authored by: sjk on May 24, '06 04:05:12PM

Choosing a "reliable" backup utility (and strategy) will depend on specific requirements and usage. All the utilities mentioned in the plasticsfuture article can be appropriate in certain contexts and it's wise to be aware of the differences. I'm grateful for the article's long overdue analysis of criteria that many users and even some developers tend to overlook, which can be crucial information if [meta]data integrity of backups/restores is a priority.

For me it would be foolish to ignore those details in discussions about pros/cons/comparisons of different OS X backup utilities but other people may not care that much.



[ Reply to This | # ]
Ok
Authored by: zottel on May 24, '06 05:40:46PM

I came to OS X from Unix, and resource forks are the only difference from unix file systems I was aware of. The article you linked to is very good -- thanks for that insight.

I hope my hint is still useful for other users by explaining some stuff like hardlinks to those that didn't know Unix before.

It seems, though, that RsyncX actually isn't the tool of choice if you want to have reliable backups that really back up everything there is.



[ Reply to This | # ]
tool of choice
Authored by: sjk on May 25, '06 02:59:51AM
It seems, though, that RsyncX actually isn't the tool of choice if you want to have reliable backups that really back up everything there is.
RsyncX may be a tool of choice when any information it doesn't preserve is irrelevant for your purposes and works reliably for what it does preserve. It's unreliable if problems occur because of insufficient requirements for specific backup/restore tasks.

I'd like people to realize that certain backup/restore tools may give them a false sense of data integrity because of unexpected, undesirable results that aren't necessarily obvious. And I hope developers become more aware of those integrity issues and make their products less vulnerable to them. Ideally, OS X backup utilities would have a "guaranteed" standard of [meta]data preservation so most users wouldn't have to be concerned with any technical details to ensure reliability.

[ Reply to This | # ]
Simple versioned backups with RsyncX
Authored by: macosx4me on May 24, '06 06:13:44PM

The version of rsync that comes with RsyncX is now seriously dated.

10.4.6, with all updates, is now supposed to reliably handle dual-fork files via the Apple-supplied rsync.

However, it will not reliably copy both dual-fork files and ACLs, still a known limitation. You can manually split the files first as one option.

If you're running Mac OS X prior to 10.4, or don't need ACLs, the following version is recommended:

http://www.lartmaker.nl/rsync/



[ Reply to This | # ]
rdiffbackup
Authored by: SOX on May 25, '06 10:42:13AM

rdiffbackup is much better than anything sugeested here. it's in fink. it has the functionality of rsync, is HFS aware, fixes illegal filename characters if it goes across different OS/file systems. Saves as much space as hardlinking but has none of the downside of hardlinking. (e.g on a hardlink it's painful to move the archive ot another disk.)



[ Reply to This | # ]