Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

10.5: Organize data files for networked Time Machine use System 10.5
Time Machine is great, and being able to use Time Machine on a network volume is just amazing, but if you don't take care of the way your data are organized, it can quickly become a huge CPU and network bandwidth eater. To understand how, you must understand how Time Machine works. First of all, the first backup backs up everything but some exclusions covered in previous tips -- it's a simple full backup, in other words.

Then comes incremental backups. The main mechanism used to get consistent 'snapshots' of a volume during its life is as follows. Recursively from the root directory, the system checks if a directory changed (files added or deleted), then:
  • If directory changed, then each file is inspected for changes.
    • File changed? If so, copy the new version
    • File not changed? Create a hard link that points to the initial location of the file on disk.
  • If the directory did not change, then create a hard link that points to the initial location of the directory on disk.
This is not a big issue on a local drive, because hard links are really quick to create, but when talking about network access, it can really become a huge bottleneck if there are thousands of hard links to create. That's because while a big file can be relatively quick to copy, creating a large number of little files or hard links is time consuming.

When can this become an issue, and what can you do to help prevent it? Read on for those answers...

Here are some relatively common cases that will cause Time Machine to work hard when backing up to a network volume:

  • Do you clean your mailbox from time to time? If so, understand that each new email will make the next backup create maybe thousands of hard links, one for each message in your mailbox. This is applicable for sent mails, trash, etc., depending of your configuration.
  • Do you have old iChat logs from before the 10.5 days? Each first message of the day will make the next backup create thousands of hard links, one for each pre-10.5 conversation you have archived.
  • Did you set up Mail to never delete old RSS articles? Each new article will make the next backup create a hard link for each existing article.

Those are examples that came to my mind, but depending on the applications you are using and how they organize data, you may have more reasons to worry. Thanks to unix, there is a simple way to list big directories on your system, meaning directories that contains lot of files -- these are the directories that may cause issues when using Time Machine over the network, depending on how they're modified. Open a Terminal window and type:

sudo find /  -type d -size +35000c
This command will ask for your root password, and display every directory that contains (approximatively) 1,000 or more files. Take a look at the output, and safely ignore those whose content does not change regularly (such as application bundles, documentation folders, etc.). For the rest of them, however, be aware that they may cause delays in your networked Time Machine backups.

So what's the solution? Subfolders. Move static data into subfolders. For example, take your inbox, and create mailboxes by year, and put every message from prior years into a year-named subfolder. This subfolder will never change again, and will be backed up only once. Take your Documents » iChats/ directory and put every pre-10.5 file in it in a Backup folder. (In 10.5, iChats are already sorted into date-based folders, so you can leave those alone.)

In simple terms, make static those directories containing lot of files -- don't allow directories with thousands of files in them to change every day if you can help it. I think this tip may be useful for a lot of Mac users that keep their data for years when changing machines, and don't like to clean or archive their data.
    •    
  • Currently 1.86 / 5
  You rated: 2 / 5 (7 votes cast)
 
[10,250 views]  

10.5: Organize data files for networked Time Machine use | 7 comments | Create New Account
Click here to return to the '10.5: Organize data files for networked Time Machine use' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
10.5: Organize data files for networked Time Machine use
Authored by: matsw on May 23, '08 08:13:43AM

Time Machine does not scan the source volume recursively for changes: this would take way too long. It uses fsevents to immediately discover changes.



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: taxi on May 23, '08 08:31:32AM

It will do a 'deep scan', if it finds a discrepancy between the fsevents report and what it was expecting.

This happens for me quite regularly. I do have a backup sparseimage rather than just a backup device, though, and I really think that is what is causing it.



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: Anonymous on May 24, '08 07:50:44AM

If it has to do a "deep traversal" often for you, then there's something wrong with your computer or the way you use it. Deep traversal will occur after crashes, removing disks without unmounting, directory corruption, or corrupted fseventsd databases. FYI /var/log/system.log carries some very nice and informative information about what's happening when TimeMachine is running.



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: tempel on May 23, '08 09:31:28AM

While TM usually does not rescan the entire volume while the system keeps running, the described mechanisms about how hard links are created are still valid, though. Hence, the article loses none of its message.



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: stokessd on May 23, '08 09:35:41AM

It's a good discussion, but "change the way you work to make time machine happier" is not a hint or a tip. Time machine works for me, I don't work for it.

I have my directories laid out nicely, and my mail all organized, I'll be damned if I'm going to change it. in the case of mail.app. I thought going from the universal mbox format to a zillion files was pretty dumb when it happened, and I stand by that today.

I've got about 90Gb of personal files (including music) and my time machine backup via wireless is about 5 minutes an hour. I also get about 100 spams a day, and my mailbox is frequently changing.

Sheldon



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: lowbatteries on May 23, '08 11:46:23AM

I agree with you on the first point - with spotlight, time machine, meta data, the need to organize files at all is getting less and less. I have a folder called "drop box" on my desktop that hold 50% of my incoming files. Cleaning your inbox? Haven't done that in years, and won't. Smart folders are the way to go.

However, the MBOX format is a dinosaur, and good riddance. It's like windows's registry - one file to corrupt them all. Many little files are always better than one unreadable BLOB. It's a huge reason I moved from thunderbird to mail.



[ Reply to This | # ]
10.5: Organize data files for networked Time Machine use
Authored by: stokessd on May 24, '08 07:52:58AM

An mbox file is a plain text file, easy to parse. It's not like outlook's binary blob. You can grep right thought it. I'll take a handful of mbox files over a zillion tiny files that aren't stored on the drive efficiently any day.

By the same token, should SQL records all be individual files? Think of the carnage...

Sheldon



[ Reply to This | # ]