tecosystems

My Backup: Dropbox, JungleDisk, S3, and a Desperate Need for Deduping

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

When it comes to backing up my music, I have four problems. Apart from the backup, I mean.

  1. My music acquisition is done on a Linux laptop (both Amazon and the eMusic store provide Linux clients)
  2. My Linux laptop’s harddrive is only 128 GB, much of which is devoted to virtual images, ergo space for music is tight
  3. My primary music library is housed on a Mac (so as to be portable to my iPhone)
  4. My music library (~80 GB) is too large to be entirely synced with Dropbox (50 GB max)

In other words, I currently download to one machine – an Ubuntu equipped X301 – which has only enough space to contain a subset of the master library. As for the master library, it’s attached to a Mac Mini running OS X, which in turn needs to be regularly updated with the newly acquired tracks from the Ubuntu machine.

Got all that?

Besides migrating newly acquired content to the master library, one of the requirements of my backup process is to push the content to the cloud. As I noted in 2007, my music may just be my most valuable material possession, and it’s certainly the least replaceable, so having an offsite copy is critical to me. Just as it will be to a horde of consumers in the years ahead, but that’s a different matter entirely.

So here’s my solution, warts and all – and yes, I’ll get to those.

Tools

  • Paid Dropbox account: $99/year
  • Paid Amazon S3 account: $0.150/GB storage | $0.100/GB transfer
  • JungleDisk client: have a lifetime account, which appears to be no longer available
  • Dropbox client
  • Rsync

Process

  • [Ubuntu laptop] Download music from Amazon/eMusic/etc into ~/Music
  • [Ubuntu laptop] ~/Music is symlinked to my Dropbox directory (instructions here), meaning everything downloaded is pushed to the cloud and to my other Dropbox equipped machines, including the Mac Mini
  • [Mac Mini] Dropbox directory is synced to the master music library on an external drive using the following process. All credit for the process – scripts included – goes to to Chris Tirpak. Anything wrong, particularly with the rsync options, is purely my stupidity:
    1. Create a backup rsync script. Mine is as follows:

      #!/bin/bash
      #
      # backup the home directory to copper
      #
      SUBJECT="Daily Backup Log"
      EMAIL="[email protected]"
      BACKUPLOGFILE="/Users/sog/bin/backupMac.log"

      #remove the old log file in case it is still there
      rm $BACKUPLOGFILE

      echo "Begin backup at: " > $BACKUPLOGFILE
      date >> $BACKUPLOGFILE

      rsync -rltvz /Users/sog/Dropbox/Music/ --exclude "Amazon MP3" /Volumes/"NO NAME"/"MUSICDIR"/
      >> $BACKUPLOGFILE 2>&1

      rsync -rltvz /Users/sog/Dropbox/Music/"Amazon MP3"/ /Volumes/"NO NAME"/"MUSICDIR"/
      >> $BACKUPLOGFILE 2>&1

      echo "End backup at: " >> $BACKUPLOGFILE
      date >> $BACKUPLOGFILE

      # send the log in an email using /bin/mail
      /usr/bin/mail -s "$SUBJECT" "$EMAIL" < $BACKUPLOGFILE

      rm $BACKUPLOGFILE

    2. Copy this file to ~/bin
    3. Tell the script to run every night to sync the directories by inserting getting launchd to execute a plist entry. Here’s my plist script:
    4. Put the plist file in ~/Library/LaunchAgents
    5. In a terminal: launchctl unload net.ogrady.backupMacSilent.plist
    6. In a terminal: launchctl load net.ogrady.backupMacSilent.plist
    7. In a terminal: launchctl list | grep -i net.ogrady
  • With the master directory thus updated with any newly downloaded tracks, JungleDisk then reflects the master directory up to S3 for permanent backup nightly.

The good news about the above: it will dutifully run rsync nightly to grab the target Dropbox directories and copy them over to the master directory. The bad news? It creates duplicate files. Lots of duplicates. My master music directory – both from this process and from previous backup efforts – has a massive duplication problem, probably on the order of several thousand duplicate files.

Which brings me to the question: anyone got an outstanding de-duplication procedure that will let me preview the files to be removed? Because I need some serious help.

Otherwise, what would you improve, and where? What’s your backup routine look like?

9 comments

  1. Managing large libraries is where iTunes falls flat (I’m sure there’s other areas, like Apple lock-in, sure). I’ve had this same sort of problem, and it blows my mind that iTunes it’s smart enough to dynamically add in more storage, span multiple computers, and so on. It seems like the key problem is that iTunes wants to be associated with one computer and one hard drive, which I’m sure has some sort of RIAA conspiracy theory behind it. The effect is making it incredibly tedious to deal with once you have a “real” music collection.

    Also, it seems like Apple has a great chance to do backup of your music, though, once again, they don’t seem to really “get” (or want to service) such huge storage needs.

  2. My solution spans 3 machines (two Macs and a dual-boot PC with Windows and Ubuntu). Like yours, one of my Macs contains the master library, but this is also where I do all of my purchasing. From there, I use Time Machine for local backups and Backblaze for online backup.

    If I am in the home, I use iTunes library sharing and a WiFi network to share the full library across machines, with the exception of Ubuntu, of course.

    I also use a free Dropbox account to sync whatever I’m listening to at the moment. I do this manually but I have considered creating a playlist and using a script or Automator workflow to automatically sync up my Dropbox directory with this playlist.

  3. Did you try emailing dropbox and seeing if you could pay them more for more space? I’d expect that’s something they’ve run into more than a few times.

  4. Being a fully Mac house, I have a wholly different solution set. I use Drobos for my music collection—one is lossy [AAC], the other is lossless [ALAC]. The former is tied to my iMac and feeds my iPod and iPhone; the latter is tied to a current-generation mini and touches my home theater.

  5. And no, I’m not really off-site-ing right now, but if/when I get that going, it will be with CrashPlan and a couple Mac minis I have located at friends’ houses locally after an initial seed.

  6. De-Duping is CrashPlan’s specialty

    You won’t find a better de-duplication / backup solution than CrashPlan – it will find partial duplicates accross your entire filesystem! (i.e. change header info on a duplicate mp3 and we’ll not back up the mp3 stream again. Add an image in a photoshop layer and we’ll recognize it’s a duplicate of the original.. etc.

  7. What if you step back and think whether you really want the burden of managing 80 GB of your “most valuable material possession”.

    Why not just use something like Pandora. True it won’t play a particular song on demand, but the fact that it helps you discover new songs more than makes up for that, IMO. Plus it alleviates me of the need to make choices of songs. At the end of a hard day, I can sit back passively and enjoy some music.

    You’ll probably say “It can’t be used offline, like when I’m going for a run”. We’ll it’s conceivable that Pandora could have an offline mode — it could keep a 100 MB queue of songs.

    “It is about a martial artist that cannot be disarmed, because he is the weapon, the comfort of a possession that cannot be taken away.” —
    http://blogs.sun.com/hendel/entry/the_unbearable_lightness_of_being

  8. > My master music directory – both from this process and from previous backup efforts – has a massive duplication problem, probably on the order of several thousand duplicate files.

    > Which brings me to the question: anyone got an outstanding de-duplication procedure that will let me preview the files to be removed

    I assume these are bit-for-bit copies?

    http://en.wikipedia.org/wiki/Fdupes

    It includes an auto-delete option too.

Leave a Reply to Geof F. Morris Cancel reply

Your email address will not be published. Required fields are marked *