10.5: Organize data files for networked Time Machine use
May 23, '08 07:30:02AM
Contributed by: Anonymous
Time Machine is great, and being able to use Time Machine on a network volume is just amazing, but if you don't take care of the way your data are organized, it can quickly become a huge CPU and network bandwidth eater. To understand how, you must understand how Time Machine works. First of all, the first backup backs up everything but some exclusions covered in previous tips -- it's a simple full backup, in other words.
Then comes incremental backups. The main mechanism used to get consistent 'snapshots' of a volume during its life is as follows. Recursively from the root directory, the system checks if a directory changed (files added or deleted), then:
- If directory changed, then each file is inspected for changes.
- File changed? If so, copy the new version
- File not changed? Create a hard link that points to the initial location of the file on disk.
- If the directory did not change, then create a hard link that points to the initial location of the directory on disk.
This is not a big issue on a local drive, because hard links are really quick to create, but when talking about network access, it can really become a huge bottleneck if there are thousands of hard links to create. That's because while a big file can be relatively quick to copy, creating a large number of little files or hard links is time consuming.
When can this become an issue, and what can you do to help prevent it? Read on for those answers...
Here are some relatively common cases that will cause Time Machine to work hard when backing up to a network volume:
- Do you clean your mailbox from time to time? If so, understand that each new email will make the next backup create maybe thousands of hard links, one for each message in your mailbox. This is applicable for sent mails, trash, etc., depending of your configuration.
- Do you have old iChat logs from before the 10.5 days? Each first message of the day will make the next backup create thousands of hard links, one for each pre-10.5 conversation you have archived.
- Did you set up Mail to never delete old RSS articles? Each new article will make the next backup create a hard link for each existing article.
Those are examples that came to my mind, but depending on the applications you are using and how they organize data, you may have more reasons to worry. Thanks to unix, there is a simple way to list big directories on your system, meaning directories that contains lot of files -- these are the directories that may cause issues when using Time Machine over the network, depending on how they're modified. Open a Terminal window and type:
sudo find / -type d -size +35000c
This command will ask for your root password, and display every directory that contains (approximatively) 1,000 or more files. Take a look at the output, and safely ignore those whose content does not change regularly (such as application bundles, documentation folders, etc.). For the rest of them, however, be aware that they may cause delays in your networked Time Machine backups.
So what's the solution? Subfolders. Move static data into subfolders. For example, take your inbox, and create mailboxes by year, and put every message from prior years into a year-named subfolder. This subfolder will never change again, and will be backed up only once. Take your Documents » iChats/ directory and put every pre-10.5 file in it in a Backup folder. (In 10.5, iChats are already sorted into date-based folders, so you can leave those alone.)
In simple terms, make static those directories containing lot of files -- don't allow directories with thousands of files in them to change every day if you can help it. I think this tip may be useful for a lot of Mac users that keep their data for years when changing machines, and don't like to clean or archive their data.
Comments (7)
Mac OS X Hints
http://hints.macworld.com/article.php?story=20080522021747160