tecosystems

My Backup: Dropbox, JungleDisk, S3, and a Desperate Need for Deduping

By Stephen O'Grady | @sogrady | May 12, 2009

When it comes to backing up my music, I have four problems. Apart from the backup, I mean.

My music acquisition is done on a Linux laptop (both Amazon and the eMusic store provide Linux clients)
My Linux laptop’s harddrive is only 128 GB, much of which is devoted to virtual images, ergo space for music is tight
My primary music library is housed on a Mac (so as to be portable to my iPhone)
My music library (~80 GB) is too large to be entirely synced with Dropbox (50 GB max)

In other words, I currently download to one machine – an Ubuntu equipped X301 – which has only enough space to contain a subset of the master library. As for the master library, it’s attached to a Mac Mini running OS X, which in turn needs to be regularly updated with the newly acquired tracks from the Ubuntu machine.

Got all that?

Besides migrating newly acquired content to the master library, one of the requirements of my backup process is to push the content to the cloud. As I noted in 2007, my music may just be my most valuable material possession, and it’s certainly the least replaceable, so having an offsite copy is critical to me. Just as it will be to a horde of consumers in the years ahead, but that’s a different matter entirely.

So here’s my solution, warts and all – and yes, I’ll get to those.

Tools

Paid Dropbox account: $99/year
Paid Amazon S3 account: $0.150/GB storage | $0.100/GB transfer
JungleDisk client: have a lifetime account, which appears to be no longer available
Dropbox client
Rsync

Process

[Ubuntu laptop] Download music from Amazon/eMusic/etc into ~/Music
[Ubuntu laptop] ~/Music is symlinked to my Dropbox directory (instructions here), meaning everything downloaded is pushed to the cloud and to my other Dropbox equipped machines, including the Mac Mini
[Mac Mini] Dropbox directory is synced to the master music library on an external drive using the following process. All credit for the process – scripts included – goes to to Chris Tirpak. Anything wrong, particularly with the rsync options, is purely my stupidity:

Create a backup rsync script. Mine is as follows:

#!/bin/bash # # backup the home directory to copper # SUBJECT="Daily Backup Log" EMAIL="[email protected]" BACKUPLOGFILE="/Users/sog/bin/backupMac.log"
#remove the old log file in case it is still there rm $BACKUPLOGFILE echo "Begin backup at: " > $BACKUPLOGFILE date >> $BACKUPLOGFILE rsync -rltvz /Users/sog/Dropbox/Music/ --exclude "Amazon MP3" /Volumes/"NO NAME"/"MUSICDIR"/ >> $BACKUPLOGFILE 2>&1 rsync -rltvz /Users/sog/Dropbox/Music/"Amazon MP3"/ /Volumes/"NO NAME"/"MUSICDIR"/ >> $BACKUPLOGFILE 2>&1 echo "End backup at: " >> $BACKUPLOGFILE date >> $BACKUPLOGFILE # send the log in an email using /bin/mail /usr/bin/mail -s "$SUBJECT" "$EMAIL" < $BACKUPLOGFILE
rm $BACKUPLOGFILE
Copy this file to ~/bin
Tell the script to run every night to sync the directories by inserting getting launchd to execute a plist entry. Here’s my plist script:
Put the plist file in ~/Library/LaunchAgents
In a terminal: launchctl unload net.ogrady.backupMacSilent.plist
In a terminal: launchctl load net.ogrady.backupMacSilent.plist
In a terminal: launchctl list | grep -i net.ogrady

With the master directory thus updated with any newly downloaded tracks, JungleDisk then reflects the master directory up to S3 for permanent backup nightly.

The good news about the above: it will dutifully run rsync nightly to grab the target Dropbox directories and copy them over to the master directory. The bad news? It creates duplicate files. Lots of duplicates. My master music directory – both from this process and from previous backup efforts – has a massive duplication problem, probably on the order of several thousand duplicate files.

Which brings me to the question: anyone got an outstanding de-duplication procedure that will let me preview the files to be removed? Because I need some serious help.

Otherwise, what would you improve, and where? What’s your backup routine look like?

tecosystems

My Backup: Dropbox, JungleDisk, S3, and a Desperate Need for Deduping

Tools

Process

About

The Book

Subscribe to Blog via Email

Recent Comments

Archives

About

Newsletter

The Book

Search

Recent Posts

Recent Comments

Categories

Archives