{"id":2788,"date":"2009-05-12T08:59:35","date_gmt":"2009-05-12T15:59:35","guid":{"rendered":"http:\/\/redmonk.com\/sogrady\/?p=2788"},"modified":"2009-05-12T08:59:35","modified_gmt":"2009-05-12T15:59:35","slug":"my-backup","status":"publish","type":"post","link":"https:\/\/redmonk.com\/sogrady\/2009\/05\/12\/my-backup\/","title":{"rendered":"My Backup: Dropbox, JungleDisk, S3, and a Desperate Need for Deduping"},"content":{"rendered":"<p>When it comes to backing up my music, I have four problems. Apart from the backup, I mean. <\/p>\n<ol>\n<li>My music acquisition is done on a Linux laptop (both Amazon and the eMusic store provide Linux clients)<\/li>\n<li>My Linux laptop&#8217;s harddrive is only 128 GB, much of which is devoted to virtual images, ergo space for music is tight<\/li>\n<li>My primary music library is housed on a Mac (so as to be portable to my iPhone)<\/li>\n<li>My music library (~80 GB) is too large to be entirely synced with Dropbox (50 GB max)<\/li>\n<\/ol>\n<p>In other words, I currently download to one machine &#8211; an Ubuntu equipped X301 &#8211; which has only enough space to contain a subset of the master library. As for the master library, it&#8217;s attached to a Mac Mini running OS X, which in turn needs to be regularly updated with the newly acquired tracks from the Ubuntu machine. <\/p>\n<p>Got all that? <\/p>\n<p>Besides migrating newly acquired content to the master library, one of the requirements of my backup process is to push the content to the cloud. As I noted in 2007, my music may just be my most <a href=\"http:\/\/redmonk.com\/sogrady\/2007\/01\/12\/friday-grab-bag-from-frigid-denver\/\">valuable material possession<\/a>, and it&#8217;s certainly the least replaceable, so having an offsite copy is critical to me. Just as it will be to a horde of consumers in the years ahead, but that&#8217;s a different matter entirely. <\/p>\n<p>So here&#8217;s my solution, warts and all &#8211; and yes, I&#8217;ll get to those. <\/p>\n<h2>Tools<\/h2>\n<ul>\n<li>Paid Dropbox account: $99\/year<\/li>\n<li>Paid Amazon S3 account: $0.150\/GB storage | $0.100\/GB transfer<\/li>\n<li>JungleDisk client: have a lifetime account, which appears to be no longer available<\/li>\n<li>Dropbox client<\/li>\n<li>Rsync<\/li>\n<\/ul>\n<h2>Process<\/h2>\n<ul>\n<li>[Ubuntu laptop] Download music from Amazon\/eMusic\/etc into ~\/Music\n<li>[Ubuntu laptop]  ~\/Music is symlinked to my Dropbox directory (instructions <a href=\"http:\/\/redmonk.com\/sogrady\/2008\/10\/17\/trick_dropbox\/\">here<\/a>), meaning everything downloaded is pushed to the cloud and to my other Dropbox equipped machines, including the Mac Mini<\/li>\n<li>[Mac Mini] Dropbox directory is synced to the master music library on an external drive using the following process. All credit for the process &#8211; scripts included &#8211; goes to to <a href=\"http:\/\/twitter.com\/ctirpak\">Chris Tirpak<\/a>. Anything wrong, particularly with the rsync options, is purely my stupidity:<\/li>\n<ol>\n<li>Create a backup rsync script. Mine is as follows:<br \/>\n<blockquote><p><code>#!\/bin\/bash<br \/>\n#<br \/>\n# backup the home directory to copper<br \/>\n#<br \/>\nSUBJECT=\"Daily Backup Log\"<br \/>\nEMAIL=\"sogrady@gmail.com\"<br \/>\nBACKUPLOGFILE=\"\/Users\/sog\/bin\/backupMac.log\"<\/p>\n<p>#remove the old log file in case it is still there<br \/>\nrm $BACKUPLOGFILE<\/p>\n<p>echo \"Begin backup at: \" &gt; $BACKUPLOGFILE<br \/>\ndate &gt;&gt; $BACKUPLOGFILE<\/p>\n<p>rsync -rltvz \/Users\/sog\/Dropbox\/Music\/ --exclude \"Amazon MP3\" \/Volumes\/\"NO NAME\"\/\"MUSICDIR\"\/<br \/>\n      &gt;&gt; $BACKUPLOGFILE 2&gt;&amp;1<\/p>\n<p>rsync -rltvz \/Users\/sog\/Dropbox\/Music\/\"Amazon MP3\"\/ \/Volumes\/\"NO NAME\"\/\"MUSICDIR\"\/<br \/>\n      &gt;&gt; $BACKUPLOGFILE 2&gt;&amp;1<\/p>\n<p>echo \"End backup at: \" &gt;&gt; $BACKUPLOGFILE<br \/>\ndate &gt;&gt; $BACKUPLOGFILE<\/p>\n<p># send the log in an email using \/bin\/mail<br \/>\n\/usr\/bin\/mail -s \"$SUBJECT\" \"$EMAIL\" &lt; $BACKUPLOGFILE<\/p>\n<p>rm $BACKUPLOGFILE<\/code><\/p><\/blockquote>\n<li>Copy this file to ~\/bin<\/li>\n<li>Tell the script to run every night to sync the directories by inserting getting launchd to execute a plist entry. Here&#8217;s my plist script:\n<\/li>\n<li>Put the plist file in ~\/Library\/LaunchAgents<\/li>\n<li>In a terminal: launchctl unload net.ogrady.backupMacSilent.plist<\/li>\n<li>In a terminal: launchctl load net.ogrady.backupMacSilent.plist<\/li>\n<li>In a terminal: launchctl list | grep -i net.ogrady<\/li>\n<\/ol>\n<\/li>\n<li>With the master directory thus updated with any newly downloaded tracks, JungleDisk then reflects the master directory up to S3 for permanent backup nightly. <\/li>\n<\/ul>\n<p>The good news about the above: it will dutifully run rsync nightly to grab the target Dropbox directories and copy them over to the master directory. The bad news? It creates duplicate files. <i>Lots<\/i> of duplicates. My master music directory &#8211; both from this process and from previous backup efforts &#8211; has a massive duplication problem, probably on the order of several thousand duplicate files. <\/p>\n<p>Which brings me to the question: anyone got an outstanding de-duplication procedure that will let me preview the files to be removed? Because I need some serious help. <\/p>\n<p>Otherwise, what would you improve, and where? What&#8217;s your backup routine look like?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to backing up my music, I have four problems. Apart from the backup, I mean. My music acquisition is done on a Linux laptop (both Amazon and the eMusic store provide Linux clients) My Linux laptop&#8217;s harddrive is only 128 GB, much of which is devoted to virtual images, ergo space for<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[77],"tags":[],"class_list":["post-2788","post","type-post","status-publish","format-standard","hentry","category-redmonk-it-report"],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts\/2788","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/comments?post=2788"}],"version-history":[{"count":0,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts\/2788\/revisions"}],"wp:attachment":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/media?parent=2788"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/categories?post=2788"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/tags?post=2788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}