Scripts to create encrypted backups to online services
Sep 28, '07 07:30:03AM
Contributed by: pxb
I've been testing mozy.com for online backups, but they charge after you go over 2GB of data, and their Mac client is still in beta and not quite stable. Then I read about using a dreamhost account for storage, which interested me as I have a dreamhost account with 200GB+ of available storage. The guide I linked to above was published by Michael Lee, and the method he suggests uses rsync, which does incremental backups so it only backs up changed files. But unfortunately, it stores your data as is, which could be potentially read by anyone if they manage to get access to your dreamhost account.
A better solution is to use an open source package called duplicity, which uses librsync for incremental backups as well, but also uses gpg to encrypt your data. It does have some limitations related to hard links. That should hopefully not be an issue for backing up your documents and photos, but may mean it's not suitable for full system backup including the OS. See the duplicity docs for more info.
I'm using dreamhost, but this hint should work with any server to which you can write files.
Duplicity supports ssh, scp, sftp, ftp, Amazon S3, etc. The following makes extensive use of Terminal, so it's not suitable for absolute beginners, but I've tried to make it fairly easy to follow for anyone willing to get their hands dirty. I'm using these scripts for doing my backups without any issues. But I do recommend you test this all with a small folder of files which you've backed up to physical media first (DVD, CD, external hard drive).
Also, an online backup should not be your only backup. If something goes wrong, you could be looking at many hours (or even days) to get your data back, and if for whatever reason you can no longer access their server, you can't get access to your data. Finally, do a full backup and restore and compare the files to make sure the whole process works as expected. Don't leave it until you need a restore of your data only to discover your restore script doesn't work. Here's how it works...
- As per Michael's instructions above, create a separate user on the dreamhost server from the webpanel. Make them a shell user so you can use full ssh access. Enable enhanced security and pick a good password. Disallow ftp and set a total disk usage limit if you want.
- While those changes are taking effect, you need to download and install MacPorts (aka DarwinPorts). Download the binary installer as a dmg file and install from that. You'll need to install this as an admin user on your machine. I installed version 1.5.0, and if you need detailed install help, installation instructions are available.
However, since we won't need X11 for duplicity, I didn't install it. They also recommend making changes to your shell profile so that the executables are in your path, which we'll do below.
- Once MacPorts is installed, we need to install duplicity and two packages it relies on. From Terminal (again as an admin user), run these commands:
$ sudo port install duplicity
$ sudo port install py-gnupg
$ sudo port install py-pexpect
You'll need the password of the admin user for sudo to work.
- Once these programs are installed, you can log back in as a normal user. As per the MacPort install instructions, edit your .bash_profile file (or other profile file, depending on which shell you use) using your editor of choice (vi, nano, etc.) to include the following at the end of the file:
export PATH=/opt/local/bin:/opt/local/sbin:$PATH
We don't directly call duplicity from the shell in this, but it's helpful if you can for testing purposes. You'll need to close your Terminal window and open a new one for the changes above to take effect.
- Create a small bash shell script to do the backup. For example, say want to back up the Documents folder in our home folder. We'll back this up to a remote folder called Documents (it's simpler to put it in a folder of its own and having a similar name on the server makes things easier as well). We'll be using gpg to do symmetric encryption, which is suitable when we'll not be transmitting the documents to anyone else. I've chose AES 256bit encryption, but Twofish or any other modern algorithm should do as well. The script should be as follows:
export PATH=/opt/local/bin:/usr/local/bin:$PATH
export FTP_PASSWORD=XXXX
export PASSPHRASE=YYYY
duplicity -v9 --ssh-askpass --gpg-options "--cipher-algo=AES256" ~/Documents scp://USER@SERVER:22/Documents
Where XXXX is your ssh login password, YYYY is your encryption passphrase (the longer the better -- a memorable quote, song lyric, or phrase from a book would be ideal). Also, USER and SERVER are the user and server for the ssh account you set up in step one. I'm using scp to copy up the files, but you can replace this with another protocol if needed (see the duplicity docs for instructions).
The first line is the same as the one we added to our bash profile above. We do this so we can run the script from cron, Automator etc. I've used -v9 for full verbosity. Once we're happy the script is working, we can turn this down or remove it. Save the script as backup_documents.bash in your home directory, and make it executable by typing chmod +x backup_documents.bash from that directory. Next, execute the script with:
./backup_documents.bash
And you should see loads of output scroll past. Depending on the amount of data being transmitted, this could take a long time (even hours). Hopefully if all goes well, it'll end with a section of statistics and the line:
Errors 0
If there were any errors or the script failed to execute, you'll need to work out what went wrong and go fix it. Mistyping username, password, and server addresses are probable causes.
- If there were no errors, you can now go and look at the files on the server. Connect to the server with ssh USER@SERVER and then do ls -l, which should show a Documents folder which duplicity will have created for you. To look inside it, type ls -lh Documents, which should show a number of .gpg files, each about 5MB in size, which contain your encrypted data. You'll also see some manifest files etc.
Use the command exit to get out of the remote shell and back to your local machine. From there, execute the script as above to test an incremental backup of your data, which should be a lot quicker as only changes will be uploaded. If you ssh back onto the server and look inside the Documents folder, you'll see diff files now as well.
- The following script will verify your data and to make sure what's on the server is the same as what's on your home machine, and give you various bits of information on the number of backups, times, volumes contained, etc. Save it as verify_documents.bash alongside the backup script created earlier. Make sure it's executable with chmod as above.
export PATH=/opt/local/bin:/usr/local/bin:$PATH
export FTP_PASSWORD=XXXX
export PASSPHRASE=YYYY
duplicity -v9 --ssh-askpass --verify scp://USER@SERVER:22/Documents ~/Documents
Note that we're in verify mode as the remote address is specified before the local file location and we specified --verify as the command. Note that we don't need to specify the cipher algorithm as AES256 this time; it'll work that out automatically. When executed, it should finish with a line similar to this:
Verify complete: 100 files compared, 0 differences found
- To test the restore, create the following script as restore_documents.bash, and again make sure it's executable as above.
export PATH=/opt/local/bin:/usr/local/bin:$PATH
export FTP_PASSWORD=XXXX
export PASSPHRASE=YYYY
duplicity -v9 --ssh-askpass scp://USER@SERVER:22/Documents ~/Documents_restored
Note that this is the same as the verify script, but we don't specify --verify. I've also set the restore to a new directory named Documents_restored in your user's home folder. This is so if there's a problem it won't overwrite any existing files. If you've backed up a lot of data, make sure you have space for the restore as well. Once that has run, you should have a fully restored copy of your files. Compare the files to make sure everything looks fine.
- You can just do backups by hand as needed, but if you'd rather automate it, we'll be using cron. Alternatively, you can create an iCal calendar event to run the shell script. Michael's article covers this nicely, so I won't repeat it here. In order to edit your cron file (called the crontab), type crontab -e which will bring it up in your chosen default editor, usually vi. To change this to nano, type:
$ export EDITOR=nano
$ crontab -e
Add a line similar to following to the file:
00 23 * * * ~/backup_documents.bash
This will run your script at 23:00 every day of the week. Read the manual for crontab if you want this done weekly etc.
To test that this works, set the time in the crontab entry to a couple of minutes past your current system time and save the crontab. Once the time has passed, ssh onto the server and veryify that the files have changed. You can do this by looking for new files in the Documents folder, or running the verify script. Once you're satisfied the cron entry works, set the real time on it, and it should do the backup automatically for you.
Note: Your passphrase and ssh password are now stored in your scripts in plain text. In this situation, this shouldn't be a major security issue, as if someone has access to your scripts, then they'll have access to your files anyway. But be careful emailing your script or leaving copies around. Comments and suggestions are of course always appreciated :)
Ideas for future enhancements:
- You can add more folders to backup to your script to also backup your music, photos etc. I think copyrighted material is prohibited from being stored on the dreamhost servers under the terms of their service, so you might want to be careful as to what you backup with them. Though technically they won't be able to see what's in the files as they're encrypted.
- Exclude files we don't need to backup. You can pass options to duplicity to ignore files you don't need backed up. For example, if you're a developer, you can skip any intermediary build files. This reduces the space used and makes backups quicker.
- Remove use of ssh password -- we shouldn't need to specify the ssh passwords in the scripts, and it's possible to create keys to login automatically and safely, but it's beyond the scope of this hint. (robg says: See this much older hint for some coverage of this topic.)
- Use asymmetric public/private key encryption -- We can use public/private key encryption, and should be able to do this without having to specify a passphrase for the encryption part using a signing-only public key. See the gpg docs for more information.
- Duplicity can backup to local storage as well as remote servers, so should be able to also automatically backup to an external hard drive.
- Enhance the scripts to save details of daily backups to a log file so you can easily check if the backups are happening as expected.
- The above scripts have a lot of duplication. The first three lines could be put in a separate script, which could be sourced from the individual scripts.
[robg adds: I haven't tested this one.]
Comments (20)
Mac OS X Hints
http://hints.macworld.com/article.php?story=20070925155109846