Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Scripts to create encrypted backups to online services UNIX
I've been testing mozy.com for online backups, but they charge after you go over 2GB of data, and their Mac client is still in beta and not quite stable. Then I read about using a dreamhost account for storage, which interested me as I have a dreamhost account with 200GB+ of available storage. The guide I linked to above was published by Michael Lee, and the method he suggests uses rsync, which does incremental backups so it only backs up changed files. But unfortunately, it stores your data as is, which could be potentially read by anyone if they manage to get access to your dreamhost account.

A better solution is to use an open source package called duplicity, which uses librsync for incremental backups as well, but also uses gpg to encrypt your data. It does have some limitations related to hard links. That should hopefully not be an issue for backing up your documents and photos, but may mean it's not suitable for full system backup including the OS. See the duplicity docs for more info.

I'm using dreamhost, but this hint should work with any server to which you can write files.

Duplicity supports ssh, scp, sftp, ftp, Amazon S3, etc. The following makes extensive use of Terminal, so it's not suitable for absolute beginners, but I've tried to make it fairly easy to follow for anyone willing to get their hands dirty. I'm using these scripts for doing my backups without any issues. But I do recommend you test this all with a small folder of files which you've backed up to physical media first (DVD, CD, external hard drive).

Also, an online backup should not be your only backup. If something goes wrong, you could be looking at many hours (or even days) to get your data back, and if for whatever reason you can no longer access their server, you can't get access to your data. Finally, do a full backup and restore and compare the files to make sure the whole process works as expected. Don't leave it until you need a restore of your data only to discover your restore script doesn't work. Here's how it works...
  1. As per Michael's instructions above, create a separate user on the dreamhost server from the webpanel. Make them a shell user so you can use full ssh access. Enable enhanced security and pick a good password. Disallow ftp and set a total disk usage limit if you want.
  2. While those changes are taking effect, you need to download and install MacPorts (aka DarwinPorts). Download the binary installer as a dmg file and install from that. You'll need to install this as an admin user on your machine. I installed version 1.5.0, and if you need detailed install help, installation instructions are available.

    However, since we won't need X11 for duplicity, I didn't install it. They also recommend making changes to your shell profile so that the executables are in your path, which we'll do below.
  3. Once MacPorts is installed, we need to install duplicity and two packages it relies on. From Terminal (again as an admin user), run these commands:
    $ sudo port install duplicity
    $ sudo port install py-gnupg
    $ sudo port install py-pexpect
    You'll need the password of the admin user for sudo to work.
  4. Once these programs are installed, you can log back in as a normal user. As per the MacPort install instructions, edit your .bash_profile file (or other profile file, depending on which shell you use) using your editor of choice (vi, nano, etc.) to include the following at the end of the file:
    export PATH=/opt/local/bin:/opt/local/sbin:$PATH
    We don't directly call duplicity from the shell in this, but it's helpful if you can for testing purposes. You'll need to close your Terminal window and open a new one for the changes above to take effect.
  5. Create a small bash shell script to do the backup. For example, say want to back up the Documents folder in our home folder. We'll back this up to a remote folder called Documents (it's simpler to put it in a folder of its own and having a similar name on the server makes things easier as well). We'll be using gpg to do symmetric encryption, which is suitable when we'll not be transmitting the documents to anyone else. I've chose AES 256bit encryption, but Twofish or any other modern algorithm should do as well. The script should be as follows:
    export PATH=/opt/local/bin:/usr/local/bin:$PATH
    export FTP_PASSWORD=XXXX
    export PASSPHRASE=YYYY
    duplicity -v9 --ssh-askpass --gpg-options "--cipher-algo=AES256" ~/Documents scp://USER@SERVER:22/Documents
    Where XXXX is your ssh login password, YYYY is your encryption passphrase (the longer the better -- a memorable quote, song lyric, or phrase from a book would be ideal). Also, USER and SERVER are the user and server for the ssh account you set up in step one. I'm using scp to copy up the files, but you can replace this with another protocol if needed (see the duplicity docs for instructions).

    The first line is the same as the one we added to our bash profile above. We do this so we can run the script from cron, Automator etc. I've used -v9 for full verbosity. Once we're happy the script is working, we can turn this down or remove it. Save the script as backup_documents.bash in your home directory, and make it executable by typing chmod +x backup_documents.bash from that directory. Next, execute the script with:
    ./backup_documents.bash
    And you should see loads of output scroll past. Depending on the amount of data being transmitted, this could take a long time (even hours). Hopefully if all goes well, it'll end with a section of statistics and the line:
    Errors 0
    If there were any errors or the script failed to execute, you'll need to work out what went wrong and go fix it. Mistyping username, password, and server addresses are probable causes.
  6. If there were no errors, you can now go and look at the files on the server. Connect to the server with ssh USER@SERVER and then do ls -l, which should show a Documents folder which duplicity will have created for you. To look inside it, type ls -lh Documents, which should show a number of .gpg files, each about 5MB in size, which contain your encrypted data. You'll also see some manifest files etc.

    Use the command exit to get out of the remote shell and back to your local machine. From there, execute the script as above to test an incremental backup of your data, which should be a lot quicker as only changes will be uploaded. If you ssh back onto the server and look inside the Documents folder, you'll see diff files now as well.
  7. The following script will verify your data and to make sure what's on the server is the same as what's on your home machine, and give you various bits of information on the number of backups, times, volumes contained, etc. Save it as verify_documents.bash alongside the backup script created earlier. Make sure it's executable with chmod as above.
    export PATH=/opt/local/bin:/usr/local/bin:$PATH
    export FTP_PASSWORD=XXXX
    export PASSPHRASE=YYYY
    duplicity -v9 --ssh-askpass --verify scp://USER@SERVER:22/Documents ~/Documents
    Note that we're in verify mode as the remote address is specified before the local file location and we specified --verify as the command. Note that we don't need to specify the cipher algorithm as AES256 this time; it'll work that out automatically. When executed, it should finish with a line similar to this:
    Verify complete: 100 files compared, 0 differences found
  8. To test the restore, create the following script as restore_documents.bash, and again make sure it's executable as above.
    export PATH=/opt/local/bin:/usr/local/bin:$PATH
    export FTP_PASSWORD=XXXX
    export PASSPHRASE=YYYY
    duplicity -v9 --ssh-askpass scp://USER@SERVER:22/Documents ~/Documents_restored
    Note that this is the same as the verify script, but we don't specify --verify. I've also set the restore to a new directory named Documents_restored in your user's home folder. This is so if there's a problem it won't overwrite any existing files. If you've backed up a lot of data, make sure you have space for the restore as well. Once that has run, you should have a fully restored copy of your files. Compare the files to make sure everything looks fine.
  9. You can just do backups by hand as needed, but if you'd rather automate it, we'll be using cron. Alternatively, you can create an iCal calendar event to run the shell script. Michael's article covers this nicely, so I won't repeat it here. In order to edit your cron file (called the crontab), type crontab -e which will bring it up in your chosen default editor, usually vi. To change this to nano, type:
    $ export EDITOR=nano
    $ crontab -e
    Add a line similar to following to the file:
    00 23 * * * ~/backup_documents.bash
    This will run your script at 23:00 every day of the week. Read the manual for crontab if you want this done weekly etc.

    To test that this works, set the time in the crontab entry to a couple of minutes past your current system time and save the crontab. Once the time has passed, ssh onto the server and veryify that the files have changed. You can do this by looking for new files in the Documents folder, or running the verify script. Once you're satisfied the cron entry works, set the real time on it, and it should do the backup automatically for you.
Note: Your passphrase and ssh password are now stored in your scripts in plain text. In this situation, this shouldn't be a major security issue, as if someone has access to your scripts, then they'll have access to your files anyway. But be careful emailing your script or leaving copies around. Comments and suggestions are of course always appreciated :)

Ideas for future enhancements:
  1. You can add more folders to backup to your script to also backup your music, photos etc. I think copyrighted material is prohibited from being stored on the dreamhost servers under the terms of their service, so you might want to be careful as to what you backup with them. Though technically they won't be able to see what's in the files as they're encrypted.
  2. Exclude files we don't need to backup. You can pass options to duplicity to ignore files you don't need backed up. For example, if you're a developer, you can skip any intermediary build files. This reduces the space used and makes backups quicker.
  3. Remove use of ssh password -- we shouldn't need to specify the ssh passwords in the scripts, and it's possible to create keys to login automatically and safely, but it's beyond the scope of this hint. (robg says: See this much older hint for some coverage of this topic.)
  4. Use asymmetric public/private key encryption -- We can use public/private key encryption, and should be able to do this without having to specify a passphrase for the encryption part using a signing-only public key. See the gpg docs for more information.
  5. Duplicity can backup to local storage as well as remote servers, so should be able to also automatically backup to an external hard drive.
  6. Enhance the scripts to save details of daily backups to a log file so you can easily check if the backups are happening as expected.
  7. The above scripts have a lot of duplication. The first three lines could be put in a separate script, which could be sourced from the individual scripts.
[robg adds: I haven't tested this one.]
    •    
  • Currently 3.00 / 5
  You rated: 3 / 5 (6 votes cast)
 
[25,909 views]  

Scripts to create encrypted backups to online services | 20 comments | Create New Account
Click here to return to the 'Scripts to create encrypted backups to online services' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Scripts to create encrypted backups to online services
Authored by: MtnBiker on Sep 28, '07 09:11:15AM

To further reinforce the data security problem, a recent article pointed out how many hard drives are being resold with cleaning. So even if you trust your ISP on a day to day basis, the server hard drives might be resold with all of your data nicely organized.

Thanks for the hint.

---
Hermosa Beach, CA USA



[ Reply to This | # ]
Use an encrypted disk image
Authored by: dbs on Sep 28, '07 12:49:53PM

A simpler approach is to install MacFuse and use the ssh filesystem to mount the file system on your machine. Then just use CarbonCopyCloner to backup to an encrypted disk image. You'll keep all your HFS+ metadata (previews, icons, etc.), get incremental backups, and have everything encrypted with the passwords stored in your Mac keychain. I use this approach to backup to a local NAS over samba.



[ Reply to This | # ]
Use an encrypted disk image
Authored by: heggaton on Sep 28, '07 05:20:28PM

Yeah, but when you're backing up 10G+ every day, it's a little impractical to backup the entire 10G disk image when you've only changed 350kb of data.

I have a private remote server that I use rsync + ssh to send my changed data to every night. If I had to do the entire 10G, it'd take a full night. Using rsync, it takes about 5-20 minutes (depending on how much data is being sent).

OP, Thanks for the tip btw :)



[ Reply to This | # ]
Use an encrypted disk image
Authored by: hamarkus on Sep 28, '07 05:23:18PM

You don't have to copy the whole disk image whenever you change something on it. You simply mount the image remotely, change what you want to change and unmount it again.

Mounting and unmounting takes somewhat longer with encrypted images but not dramatically so.



[ Reply to This | # ]
Use an encrypted disk image
Authored by: hamarkus on Sep 28, '07 05:20:40PM

As even a very small corruption of this encrypted disk image file can make your data un-retrievable, I would not want to rely on this method as my sole back-up and at least combine it with a local, un-encrypted back-up, or simply with other independent back-ups.



[ Reply to This | # ]
Use an encrypted disk image
Authored by: pxb on Sep 29, '07 01:14:41AM

Precisely, it's not ideal to rely on this as your only backup method. I'm using this in addition to my local (non-encrypted) backup to an external drive. The idea is it's a good insurance policy in case your home gets robbed and they take your external drive as well. Hopefully it will never be needed anyway, but I sleep a bit safer at night knowing I have something out there just in case...



[ Reply to This | # ]
Resource Forks...
Authored by: Mechanist on Sep 28, '07 04:31:50PM

Unfortunately resource forks are still quite common on Mac OS X. As far as I've been able to determine, this scheme won't back them up. That's not much of a surprise given that not many Unix tools take resource forks into account (except for the modified versions Apple has included on Mac OS X). Before considering a scheme like this you should really consider whether this limitation would be a problem for you.



[ Reply to This | # ]
Resource Forks...
Authored by: sjk on Sep 28, '07 05:27:13PM
Before considering a scheme like this you should really consider whether this limitation would be a problem for you.
In some cases it can be difficult to accurately determine whether or not preserving resource forks is necessary. And even if it's not an issue now it might unexpectedly become one in the future. For me it's been easier having backup strategies that always preserve them to avoid unforeseeable problems if they weren't.

[ Reply to This | # ]
Resource Forks...
Authored by: pxb on Sep 29, '07 01:01:29AM
Hi,

thanks for the warning about resource forks. On the duplicity website (http://duplicity.nongnu.org/new_format.html) they mention it being a limitation of the tar format it uses. However, the tar on OSX 10.4 is supposed to support Resource Forks, though whether this happens with the version of tar that duplicity uses, I don't know at present. I'll do some more research. I guess the simplest way would be to backup a file with known resource fork data and see what happens.

pxb

[ Reply to This | # ]
Resource Forks...
Authored by: Mechanist on Sep 29, '07 04:28:16PM

As far as I know it's correct that the tar file format doesn't support resource forks. What the Mac OS X version of tar does is transparently convert files with resource forks to "AppleDouble" format, which just means that the resource fork gets put into the data fork of a secondary file. So if foo.jpg has a resource fork, the tar file contains foo.jpg with no resource fork as well as ._foo.jpg, which is a data-fork file containing the resrouce-fork info. When you extract the file the forks are restored.

If duplicity makes use of the on-board Mac OS X version of tar then it should be OK for resource forks, but if it builds the files some other way then it probably isn't.



[ Reply to This | # ]
Resource Forks...
Authored by: asparria on Sep 30, '07 10:14:58AM
I use Backup Bouncer (http://www.n8gray.org/blog/2007/04/27/introducing-backup-bouncer) to test backup reliability. It checks resource forks, hard links, ownership, pipes, BSD flags, etc...

Just check it out and see if your backup software solution is as good as it seems.

[ Reply to This | # ]
Resource Forks...
Authored by: pxb on Oct 16, '07 02:43:31PM

Thanks for the link. I'll use it to check if my backups are copying what I need



[ Reply to This | # ]
Scripts to create encrypted backups to online services
Authored by: mastige on Sep 29, '07 07:28:13PM

Standard UNIX tools do have their limitations. I made regular use of rsync to backup my home folder, including my Parallels folder. When my Windows system became corrupted, I discovered that the rsync backup was also useless, presumably because of lack of copying of resource forks or other important data. That left me with no usable copy and a Windows installer telling me that I had exceeded my maximum installs. An expensive lesson. Carbon Copy Cloner and the MacFuse solution makes all kinds of sense for important data.



[ Reply to This | # ]
Scripts to create encrypted backups to online services
Authored by: pxb on Oct 16, '07 02:42:00PM

Just a quick update on the resource fork issue. From my research, it looks like resource forks aren't all that common any more, but it'd still be nice to (a) know if you need to back them up, (b) back them if you need to or think you may need this information in the future

Personally I'm using a local external drive as my main backup, and the online backup as an emergency second-level solution. Hopefully I'll never need to use the online backup. My aim is to backup my photos, emails, dev work etc. If I lose the thumbnail preview on my photos or some file permissions that's not the end of the world. As long as the important data is saved.

Anyway, I edited a bash script I found online (unfortunately I can't remember where I found the script, so can't credit the author properly - apologies for that), which will search for resource forks in any of the files for a given directory:

#!/bin/bash
find $1 | while read file
do
# pipe errors of ls to /dev/null to hide errors from calling this on directories without forks (throws No such file or directory error)
FILE_INFO=`ls -l "$file"/rsrc 2>/dev/null`
# if the size of the rsrc is 0, then no fork exists
FILE_SIZE=`echo "$FILE_INFO" | cut -f 9 -d " "`
if [[ $FILE_SIZE -ne "0" ]]; then
# show the size of the fork and the file it is attached to
echo $FILE_SIZE $file
fi
done

save that as an executable file findresforks.bash, and run as something like:

findresforks.bash /Users/Bob/Desktop

and it'll give the size and name of any file with a resource fork. I ran it over my mail directory (~/Library/Mail) and it only found 1 file with a resource fork (~Library/Mail//Bundles/Letterbox.mailbundle/Icon), which is a plugin for Mail.app. Given I'm interested in saving my emails, I don't really care if that plugin's Icon gets lost.

Running the same command across my photos gave quite a few files, which I'm guessing from the file size of each (50-80k) would be thumbnails, which I wouldn't miss really.

You can test this with:

echo 12345 > test.txt
echo 1234567890 > test.txt/rsrc

Which should give a 6 byte data fork and 11 byte resource fork. Running the above script on the directory with test.txt in it should give something like:

11 ./test.txt



[ Reply to This | # ]
dreamhost backups prohibited
Authored by: kyngchaos on Oct 17, '07 02:32:44PM

Now there's a bummer.

New item in the Dreamhost Status blog today: they "clarified" their policy by expressly prohibiting the storage of files for backup or personal purposes. They didn't say if or how they would enforce that, but I'll play it safe. And I was just getting all psyched up to backup to my DH account (I had even started something with sftp before finding this hint) :(

You can only store files using their Files Forever feature, which is not suitable for a regular backup.



[ Reply to This | # ]
dreamhost backups prohibited
Authored by: pxb on Oct 18, '07 01:59:50PM

They helpfully picked the day I'm finally happy with my scripts to announce this...

To be honest, I never knew for certain they'd be fine with this. So, I guess I've not lost anything. I'm not furious, just disappointed.

I'm not storing any copyrighted material (or anything else illegal) and am using only a GB or so of my allotted 260GB of storage, so I'm not abusing the service in any way.

Their front page offers 500GB of storage space right now - what website needs that much space? The iTunes Music Store?

I guess I'll continue to use it and if they ask me to delete my files I will, and go somewhere else. Maybe they're only going to go after people who seriously abuse the privilege. Still, it's annoying to be left in limbo.

I've been with Dreamhost for 7 years now and have recommended them to loads of people...



[ Reply to This | # ]
dreamhost backups prohibited
Authored by: abobrow on Jan 01, '08 03:21:32PM

I just found this out accidentally when I emailed DH a question about my account. I mentioned I was using it for backup and the tech politely advised me to remove my backup before the admins found it.

I did a little research and found Bluehost.com gives you 600GB of storage and expressly states that it's okay to use for offsite backup. I cancelled Dreamhost and switched. I don't like the bluehost control panel as much, but it works.



[ Reply to This | # ]
dreamhost backups prohibited
Authored by: abobrow on Dec 11, '08 01:20:03PM

Update. Bluehost froze my account with no warning for using my space for backups. I told the tech their TOS did not prohibit this use, but he directed me to a page that says exactly the opposite. I don't know if they changed their TOS or if I just missed it the first time.



[ Reply to This | # ]
dreamhost backups prohibited
Authored by: pxb on Nov 05, '08 12:10:04AM

I've just noticed in the latest Dreamhost newsletter that they're expressly allowing backups now.

The limitations/features are:

- must be done as a special backups user, which must be enabled from the control panel
- you get 50GB free, then it's $0.10/GB/month after that
- they don't do backups of the data
- backups can be done via ftp or scp/rsync/sftp



[ Reply to This | # ]
Scripts to create encrypted backups to online services
Authored by: lucidsystems on Nov 15, '09 03:13:06PM
You may be interested in LBackup, as it supports synchronizing sparse bundles (available on Mac OS 10.5 and later) to to a remote host(s).

[ Reply to This | # ]