Blogs

RedMonk

Running Alpha Lucid on the Dell T7500

Lucid Workstation Screenshot

The past few years, as I’ve written about, I’ve worked primarily off Thinkpad laptops, with an aging Sun Opteron workstation available for more computation heavy tasks. Neither of those pieces of hardware, however, is up to the workloads I’ve been engaged with lately. Virtualizing multiple operating system instances, working on large datasets, or tinkering with big data software such as Hadoop. To be honest, it was all either machine could do to run Chromium with my usual 60+ tabs open. Sure, the cloud helps, but when you can barely keep a browser up and running, it’s time for new gear.

Hence my call to Dell, who for the sake of full disclosure is a customer of ours. In response to my inquiries, Dell shipped me a loaner top of the line workstation to test, the Dell Precision T7500. I’ll have more on what, specifically, the machine is for later. For now, a quick rundown on the specs, setup and software choices.

The Hardware

This beast comes equipped with two quad core Intel Xeons running at 3.2 GHz, 24 GB memory, and 3×300 GB 10K RPM hard disks. It’s easily the most powerful box I’ve had locally since my mainframe days, in other words. Plugged in to this are the 30″ and 24″ monitors I already had on hand.

The Operating System Choice

The first thing I did when I got the box was try to get my beloved Thinkpad USB keyboard working with the Windows 7 Home Premium instance preloaded on the workstation. I failed. Even after manually installing drivers from the CD that came with the keyboard, Windows insisted that my U, I, O, J, K, L and M keys were instead 4, 5, 6, 1, 2, 3, and 0, respectively. So rather than waste more time tinkering, I gave up and installed Ubuntu. Being a latest and greatest guy, I picked the still Alpha Lucid release of the distribution.

As an aside, please note that I’m sure that Windows can be made to work with that hardware just fine, and that I’m not recommending that people use Ubuntu simply because they can’t get a piece of hardware to work. If you like Windows, use Windows. I happen to prefer Ubuntu, so that’s the context for this decision. Your mileage may vary, as always.

What Works

Anyway, Ubuntu recognized the keyboard perfectly, to the point that even the volume up/down/mute buttons work properly. Everything on the machine works out of the box, actually, with but a few exceptions. A quick rundown of the hardware:

  • Wifi: Atheros AR5001X+, just works
  • Graphics: NVidia Quadro FX 3800 just worked with the single 30″, had to enable the non-free drivers to get compositing working and the 24″ inch online as a dual monitor
  • Sound: Sound Blaster X-Fi XtremeMusic (D), just worked
  • Internal USB card reader: just worked
  • External Hard Drives: 2xSeagate 1.5 TB, just worked (this bug has been satisfactorily addressed for me)
  • iPhone: while I don’t use this functionality, Lucid sees my iPhone perfectly, will play music off of it, and even offers primitive music support

System-wise, Ubuntu 64 bit sees all of the available memory and cores correctly as evidenced by this htop capture. As mentioned, I had to enable the proprietary NVidia driver to get the fancy graphics and second monitor working, but the driver installation is completely automated.

Simply put, Ubuntu Lucid 64 bit pretty much just works on the T7500, at least with the configuration I chose.

The Software

From there, I did my usual Linux install. First, the software to be installed:

  1. Amazon MP3 Downloader
  2. Chromium (my default browser now)
  3. Deluge (my now default Bittorrent client)
  4. Dropbox (a staple of my existence these days)
  5. Emacs (my editor of choice)
  6. eMusic Download Manager (for downloading eMusic tracks)
  7. Flash (still necessary)
  8. GNOME Do (a Quicksilver like application for Linux)
  9. htop (top, but pretty and visual)
  10. revolution-r (R, in other words, for statistical analysis)
  11. VirtualBox (a really excellent free virtualization package, I only wish that a.) I could resize hard drives and b.) that Aero would be enabled for Vista/Windows 7 as it is both in Parallels and VMware Workstation/Fusion)
  12. VLC (will play anything, as they say)

Eventually I’ll get around to installing all the infrastructure stuff I use to test like Apache, MySQL, and so on, but these are the day to day basics I need. Next, the software to be removed:

  1. Evolution (don’t use a mail client, and don’t particularly care for Evolution)
  2. OnBoard (don’t need it)
  3. Transmission (my experiences with this Bitorrent client has been very poor)

The Configuration

After using one of Bisigi’s Pretty Themes for a while, I’ve cut over at least for the time being to one Ubuntu’s new “Light” themes, Ambiance. I know some people are a little bent out of shape about the window controls, but I just assumed what we were subsequently told: that they looked heavily at how existing operating systems did things, Apple in particular. While I agree that not everything Apple does is perfect, the fact is that they’ve invested a ton of time and energy into user interface research over the years, and they are, at least in my view, the best in the world at UI. Meaning that if Apple believes the controls should be on the left, I don’t think it can hurt to try it.

Because if I decide I don’t like it, I don’t have to use it.

Finally, a bit of quick configuration to pull in my emacs settings and so on.

  1. ln -s ~/Dropbox/.bash_aliases .bash_aliases: pulls in my bash aliases from my Dropbox copy
  2. ln -s ~/Dropbox/.emacs .emacs: pulls in my .emacs file from Dropbox
  3. ln -s ~/Dropbox/emacs emacs: pulls in my emacs directory (w/ themes, etc) from Dropbox
  4. ssh-keygen -t rsa: generates an ssh certificate for the box
  5. ssh-copy-id username@host: copies my certificate to our various servers so I don’t have to log in each time

And that’s about it, apart from migrating a few VirtualBox harddrives from the old workstation to the new one.

I had very few problems, and having done this multiple times most of the above takes less than five minutes of effort because it’s all handled by package management tools. Everything wasn’t perfect, however.

The Issues

  1. Amazon still doesn’t provide 64 bit versions of its MP3 store downloader, and the usual fix of using getlibs didn’t work on Lucid. Nor did Pymazon, a Python based alternative, work properly. Still don’t have a fix for that, as I’d prefer not to get in the habit of copying libraries around.
  2. Brasero, meanwhile, the CD/DVD burning tool broke as it always does during Alphas of new releases, so I’m temporarily using the less user friendly GNOMEBaker to burn CD’s and DVD’s.

Other Comments

Two other items of note for the Ubuntu geeks in the audience. With this migration, I’ve officially dropped Banshee in favor of Rhythmbox. I liked Banshee, and still prefer it in many ways, but after a couple of ugly crashes that corrupted the library thus losing my playlists, I needed a replacement. Rhythmbox isn’t perfect, but it works and is nicely integrated into Ubuntu.

Second, I haven’t (yet) installed Pidgin, the IM client that OS X’s Adium is based on. In part because Ubuntu has transitioned to Empathy and because some of the underlying technology is interesting, I’m giving that project a shot. But there are some serious usability issues with the interface, and how it’s woven into the Ubuntu desktop. The integration into the Me Menu is suboptimal, the user interface – specifically its usage in the Indicator applet – is terribly confusing, and the account creation process was clunky.

It’s not clear to me that Empathy is ready for prime time from a UX standpoint, so I’ll be curious to see how that aspect evolves within Ubuntu and the other distributions that leverage it.

Overall Impressions

The T7500 is just stupid fast, and Lucid’s a nice interface for the hardware. I don’t have enough up and running yet to do any legitimate comparative benchmarking versus my usual hardware, but it’s impressive even on trivial applications. The disk usage analyzer, for example, scans the entire filesystem in less than ten seconds; with either of my old machines, runtime was a minute to two, depending on what else was running. The rendering of an eight minute video in Pitivi, the video editor included with Ubuntu, took about forty seconds. Chewing through the entire works of Shakespeare to count the frequency of the word “Zounds” using Hadoop took about fifteen seconds, but that was on a virtualized instance with more limited resources. And as as you can see from the screenshot above, virtualization is not much of a challenge for this machine.

I’ll have more on how the box will be used both on virtualization and big data later, but for now the Linux compatibility report for the hardware is excellent, as is the performance.

by-nc-sa

Finally: The Google Apps Marketplace Q&A

For about five years now, I’ve been anticipating the rise of application and developer marketplaces. It baffles me that no one has perceived commercial opportunity in the massive inefficiencies in a customer’s ability to discover, procure and purchase apps or services for a given platform. I’ve done Q&A’s on the subject, I’ve used speaking engagements to evangelize the topic, and I’ve even articulated the pieces you might need and how most of them are available from Amazon. What has been the response to my business model for the 21st century?

Crickets, mostly.

Mobile’s an obvious exception. Once Apple built themselves a (massively successful) marketplace, vendors couldn’t build their own fast enough. And we’ve seen steps in the right direction here and there, such as the Eclipse marketplace or the Ubuntu Software Center. But the enterprise market has largely yawned at the concept of marketplaces,

But as of Tuesday, it would appear I finally have an answer to this question. The answer, at least for now, is Google.

As predicted by the Wall Street Journal last month, Google announced this week the immediate availability of what they call the Google Apps Marketplace. To explore what this means, let’s turn to the Q&A.

Q: Before we begin, do you have anything to disclose?
A: For once, not a ton. Some of the vendors who might be impacted by this announcement, such as IBM and Microsoft, are RedMonk customers, as are the mentioned Canonical and Eclipse, but Google is not a RedMonk customer, nor is Amazon. RedMonk, however, is a Google Apps customer.

Q: What is a marketplace?
A: There is no set definition, because it depends on the nature of the platform. But here are some rough stages of evolution for marketplace type offerings.

  • The Repository:
    An aggregation of applications, libraries, themes and other packages appropriate – and generally packaged for – a given platform. Optimally, installation is integrated as in the case of Linux distributions (apt, Portage, YaST, Yum, etc), but it may simply be a central portal collecting related assets, as is the case with WordPress. Users can browse, discover and (optionally) install free applications which will improve their platform, but the experience essentially ends there.
  • The Store:
    The Store also aggregates platform related applications and packages, but introduces the commercial element. Paid applications are made available, and generally the purchasing process is integrated. The canonical example in this category is the iTunes Application store. Users can browse free and paid applications using the interface, acquiring/purchasing and installing either type.
  • The Marketplace:
    The Marketplace – a true marketplace – includes all of the above, but adds one critical element: people. While Repositories or Stores make the process of acquiring and installing applications more efficient, they offer little to no benefit subsequent to that process. Need help with set up? Configuration? Tuning? Integration? You’re on your own: try Craigslist or eLance. A true Marketplace, however, adds the critical element of people to this equation, connecting customers to individuals or companies with related expertise.

Q: And you think Google’s built a true marketplace?
A: They appear to be pretty close, yes. It’s not quite a wide open marketplace for human resources yet; there don’t appear to be multiple individual implementers for the different software options, for example. But just look at the Small Business Implementation services: lots of small shops in there competing on pricing for implementation. And that single example illustrates to me why this is a good idea for everyone involved.

Q: How’s that?
A: Consider the small business market. Many SMBs are using ISP based email to run their business, which is why you email them at something like joesflowershop67@comcast.net. They’re doing this not necessarily because they believe it’s the best solution for their business, but because everything else is too hard. But what if you could tell them that they could have mail, calendaring, online documents and everything else that comes with Google Apps for around $20 or $30? How many would go for that? Quite a few, I’d bet, and I doubt Google would take that bet.

But the set up for Apps, which involves making changes to domain MX settings and such, is a little too much to ask from the kind of people that are trying to login to Facebook from Google.

For someone who’s experienced at it, however, the setup is trivial: about 30 seconds work. So you arbitrage the delta in difficulty and come up with a fair price (as determined by a competitive market), and everyone wins. The SMB gets a better infrastructure at a very reasonable cost, the implementer gets a decent margin on a few minutes worth of work. Google, meanwhile, gets a $100 per developer/org and customers that may or may not pay them but will provide them with more data in the meantime while not using a competitor’s solution. Everybody wins.

Q: But as popular as Gmail and Apps might be, isn’t their ecosystem dwarfed by that around products like Exchange?
A: Undoubtedly. And that’s one reason the Marketplace is so interesting.

How do you find solutions if you’re using Exchange? Where do you go? If you’re relatively knowledgable, you might think to visit the Exchange homepage and search for partners. Which might lead you here, to an uninspiring partner search page. If you keep hunting, you might wind up at Microsoft’s version of the Solution Marketplace. Perhaps from there I find a listing: what are my options at that point? Show a phone number, send an email inquiry. The pricing? Per seat, that’s all I’m told.

Contrast that with similar products from its Google counterpart, where the pricing is described, you can read actual reviews of the product or services and – most differentiating – you can click to install the product. The SaaS nature of Google Apps has its disadvantages, particularly for conservative buyers, but ease of installation is not one of them.

Q: The primary advantage of Google’s Marketplace then, vs something like Microsoft Exchange, is in product acqusition?
A: The advantages of a true marketplace as Google’s built it exist at every layer: inefficiencies are significantly reduced or eliminated in discovery, acqusition/implementation, purchase, and experiental ratings/context.

Q: So do you think that Google’s community is going to approach the size of Microsoft’s soon?
A: Rome wasn’t built in a day, and Google’s got a long way to go before it can rival the size and scope of competitive communities like Microsoft’s. That said, if I was competing with Microsoft, this is how I would do it.

Q: What didn’t Google get right?
A: It’s a little difficult to discover more general resources: I have questions or needs, say, around complex spreadsheets, or the construction of templates in Docs. It’s also vendor focused at the expense of individual practitioners. Nor is it obvious what the Google Qualifications mark is. And as far as the applications implementation experience goes, it can be a bit clunky: our TripIt installation is still hung up.

Q: What’s in it for developers?
A: Access to the market. Say you do Google Apps implementations; how do you do customer acquisition? Locally or regionally, you might employ traditional mechanisms such as print or radio, but it’s probably just as easy for you to set up apps for a customer in Auckland as it is next door. By listing yourself in the marketplace, you have the opportunity to access a wider and more targeted market than you would with, say, AdWords.

Q: What about the competition? Won’t it be difficult to be heard? How differentiated can you be as someone who sets up Apps, for instance?
A: That’s an excellent question, and the answer is we don’t yet know. It will be interesting to see how Google manages the volume of developers. They’re already acting to throttle participation with the one time $100 signup fee, which should keep out most of the drive-by, non-serious implementers.

But whether they manage the volume tightly or loosely, this will be an important channel, simply by virtue of the volume that apps can generate.

Q: What about the applications space? Is it in a provider’s best interest to list with Google?
A: Why wouldn’t you? Google’s claiming that 2M businesses have already chosen Apps as their platform, and even if that assessment is inflated by 50% you’re still talking about a sizable market. Even better, a sizable market that is likely underpenetrated relative to more traditional marketplaces.

Q: Underpenetrated how?
A: The businesses running on top of platforms from IBM or Microsoft have had years, decades in some cases, to build up solutions around Notes or Exchange. Google Apps, on the other hand, is a relative greenfield simply because the integration of applications on the server side is new. It’s kind of a gold rush, in that respect.

Q: Big picture, what does all of this mean?
A: Let’s look at it by audience. For the customer, it means either a streamlined route to implementing Google Apps, an exponentially more accessible portfolio of resources once you get there, or both. For developers or application providers, it’s a relatively less crowded channel opportunity, one where the infrastructure for both installation and payment is already built for you. And for Google, it’s nothing less than an opportunity to dramatically expand their ecosystem, which – in addition to the obvious direct benefits in terms of customer acquisition and partner opportunities – acts to reassure conservative enterprise buyers that this platform is for real.

Q: And for competitors?
A: It means that they should have been building marketplaces all along, as I’ve been saying ;)

by-nc-sa

The View from NoSQL Live: Three Takeaways

NoSQL Live was a very different show than I’ve been to in recent months. It had very little in common with, for example, HadoopWorld, where the audience was largely already intimately familiar with the technology and value proposition. The NoSQL Live audience, by contrast, to judge from the questions, was mostly there to learn. With many of the usual suspects from the NoSQL world in attendance, along with substantial representation from projects like Cassandra, HBase, memcache, Riak, Voldemort and so on, the show certainly did not lack for subject matter expertise.

But the number of those generally unfamiliar with NoSQL was as surprising as it was gratifying. Gratifying because it serves as a proxy for interest: besides the experts, there were a substantial number of people there looking to get up to speed on the space. Which they certainly had an opportunity to do.

Adam Marcus, a graduate student at MIT’s Computer Science and Artificial Intelligence Laboratory, did a much better job that I could have taking notes on the show here, so I won’t rehash that. Instead, three quick takeaways from the show.

The NoSQL Term is a Problem

In his remarks to open the show, Dwight Merriman – the CEO at 10gen (the company behind MongoDB) – asserted that while the term “NoSQL” had problems, for better or for worse, the name had stuck. Which might indeed be true, and if so the projects may as well make the best of the situation, as he suggested. But if that’s true, the so-called NoSQL projects – all of them – are going to have problems.

Witness Merriman’s definition of NoSQL: no joins and light transactional semantics. Even were we to accept that definition – and even that is problematic as the support varies from project to project – we still have issues. Clearly column databases are differentiated from graph databases, just as both are differentiated from key value stores and document databases.

Currently, however, they are all referred to – marketed, even – under the blanket NoSQL. Hence some of the confusion heard from users yesterday, who were struggling to grasp why all these NoSQL tools had seemingly nothing in common with one another.

The good news is that there is – as evidenced by this and other NoSQL events – substantial interest and traction in data storage software that is not a relational database. The problem is that the naming is likely to become a serious problem if it isn’t already.

Consider slide 13 of Tim Anglade’s excellent presentation embedded above. If he’s correct, and we’re just this side of the Gartner’s trough of disillusionment – and I believe that’s a reasonable assertion – the NoSQL term is going to be one of the reasons for the fall. Most of the current NoSQL adopters are sufficiently up to date on developments in the data persistence space that the name is not much of an issue. The next wave of adopters is guaranteed to be less familiar with the distinctions between the project approaches and more frustrated by the inherent educational challenges therein.

I know quibbling over a name seems inane to a great many technologists out there, but you’d be surprised at how much difference a name makes in this industry. Remember what the mere application of the term Ajax did to discussion of that technical approach? Now consider if Ajax had attempted to encompass that and native client side development. That’s what NoSQL is doing at present, and it’s a problem.

MySQL is a Target

Mark Callaghan recently said:

I think that MySQL+memcached is still the default choice and I don’t think it is going away in the high-scale market.

Eric Day, Drizzle developer, likewise said that that project is complementary to many NoSQL efforts when I spoke to him on Monday. Clearly his new employer (and yes, more on that later) believes that, significantly contributing as it does to both NoSQL (Cassandra) and SQL (Drizzle) projects.

I think they’re right. The maturity of the MySQL ecosystem and its basic ubiquity will not easily be thrown over, if ever. That said, the commentary from Twitter’s Ryan King was really eye-opening.

It’s no secret that Twitter has been moving slowly towards Cassandra and away from MySQL. This is from an interview that King did previously with myNoSQL, describing the motivations:

We have a lot of data, the growth factor in that data is huge and the rate of growth is accelerating.

We have a system in place based on shared mysql + memcache but its quickly becoming prohibitively costly (in terms of manpower) to operate. We need a system that can grow in a more automated fashion and be highly available.

After yesterday, we can add to that some numbers. Twitter’s Cassandra infrastructure is at 45 nodes, which is handling – in parallel with the MySQL/memcached infrastructure – some 600/700 Tweets (i.e. writes) /second (50M/day) with massive spikes (like for SXSW, for example) and nine or ten billion rows.

The MySQL infrastructure – largely thanks to a massive memcached presence, according to what we heard yesterday – was still handling this load. But much of the real pain comes apparently in manageability. The MySQL cluster could, in the words of King, “never be taken down,” as the restarts were too painful. The Cassandra nodes, meanwhile, are rebooted regularly with rolling restarts.

What does this mean? Nothing, yet. Twitter is a traffic outlier than 99% of MySQL or NoSQL users will never see. But the number of higher traffic properties that are leaving MySQL based infrastructure for NoSQL alternatives is worth monitoring, just as was their original takeup of MySQL.

NoSQL and the Cloud

One of the panels yesterday was on the subject of NoSQL in the Cloud. The panelists were Benjamin Day (consultant with Azure experience), Jonathan Ellis (Rackspace, Cassandra lead), Adam Kocoloski (Cloudant, a CouchDB vendor), and Daniel Rinehart (Allurent, a startup using AWS’ SimpleDB).

Predictably, opinions varied on the suitability of NoSQL technologies for the cloud. The vendors offering or leveraging NoSQL services in the cloud – Allure/Cloudant/Rackspace – were more or less positive on the concept. Ellis, meanwhile, was less enthusiastic, urging workload based deployment: elastic, transient needs to the cloud, general, sustained workloads to the datacenter.

What I was surprised to hear little about, except from a questioner, is the question of operations. For many cloud users, questions of workload or the suitability for a given technology such as NoSQL to the cloud come second. The primary concern is operational costs, or the lackthereof. Put simply: the cloud takes important operational elements and makes them someone else’s problem. This may be even more compelling with NoSQL, because the relative immaturity of the projects means that they are often suboptimally packaged. Being able to spin up a prebuilt image on AWS or Rackspace is likely to be significantly preferable to an alternative of hand assembling all of the necessary pieces of your NoSQL infrastructure.

This is why I find services like Bradford Stephens’ Drawn to Scale interesting (again, more on them later): being able to offload the operational costs – both in dollars and learning curve – of software that can more efficiently attack large or unstructured datasets is likely to be an interesting proposition.

Whether NoSQL technologies are ideally suited to multi-tenant cloud environments, then, seems to be besides the point: they will be used there – heavily – regardless. If they’re not well suited to that, from a customer’s perspective, that’s the provider’s problem.

Anyway, thanks to the folks from 10gen, Cloudant, Hashrocket, O’Reilly, GigaOm, myNoSQL et al who put on the conference. Well worth the trip down.

by-nc-sa

What Sports Can Teach Us About Analytics: The MIT Sloan Conference

Last weekend’s MIT Sports Analytics conference was, pretty easily, the best conference I’ve attended. And I attend a lot of conferences. With a 9:00 AM session on Baseball Analytics featuring one former Red Sox General Manager and the club’s current Director of Baseball Information Services, obviously it had a potentially unfair advantage over the competition, what with my unhealthy obsession with that sport. But it’s a common misperception that all there is to a conference are the speakers or, in this case, panelists. The reality is that there’s a lot more to putting on a good conference. Venue matters. Session topics matter. Session placement matters. Sponsors matter (the Bloomberg lunch demo was unreal). For some people, things like the food probably matter.

Happily, the folks from Sloan could not, in my view, have done a better job with the show. It was professional, it was tightly executed, and it was so compelling that I either stood (a few sessions were standing room only) or sat in a chair for basically the entire day. No breaks, no calls, nothing. It was like a Pedro Martinez start, circa 2000: you didn’t dare get up for fear of missing something important. How many shows are you going to attend that feature a panel of ESPN columnist Bill Simmons, Indiapolis Colts President Bill Polian, Houston Rockets GM Daryl Morey, Kraft Group (owners of the New England Patriots) President and Williams alum Jonathan Kraft and Dallas Maverick’s owner and technology entrepreneur Mark Cuban, moderated by Michael Lewis, debating Moneyball, the latter’s bestselling book?

Not too many, I should think. So kudos to the folks from Sloan: I cannot recommend their show highly enough.

What was particularly interesting for me as a technologist, as opposed to a baseball fan, was the fact that on some level, the subject matter was incidental. What we were there to talk about was how to collect and use data to make more informed decisions; that the context happened to sports was interesting, but hardly unique. Analytics usage in sports has accelerated as salaries and payrolls have escalated. When it’s time to sign free agents to guaranteed contracts of tens of millions of dollars, it behooves the club to make the best decision it can. How? By using the data it can collect, obtain or derive.

As they say when a popular player is traded or not resigned: baseball is a business. A different business than, say, heavy industry manufacturing or pharmaceutical research, yes, but when it comes to using data to make better decisions, business is business.

Here are ten lessons, then, I think traditional businesses might learn about analytics from their counterparts in sport:

Culture as an Obstacle

Simon Wilson, the Head of Performance Analysis from Manchester City: “We’re especially jealous when we come over and look at the culture of using data in sport.” Ironically, American sport is perhaps the best illustration of the challenges that culture can present. Bill James started publishing the Bill James abstract in 1977. Thirty-plus years later there are still clubs that regard the conventional wisdom that James dared challenge as sacrosanct and infallible.

The lesson? No matter how good your data, a portion of the poputation will not accept it. If you want to drive analytics into your industry, be prepared to fight an uphill battle. The bad news is that this can make life difficult for analytics converts and evangelists. The good news is that it can be an opportunity.

Look For an Edge

While there are curious exceptions – the NFL, for example, apparently forbids technology of any kind (even a calculator) in the coaches box – for the most part sport allows teams to compete off the field as well as on. If the culture is anti-analytics, then, this can potentially be a good thing: it gives you an edge.

John Abbamondi, the Assistant General Manager of the St. Louis Cardinals, acknowledged this when talking about FieldF/X, a new system being put in place to provide significantly better metrics for measuring defensive performance. His concern? “One of the things I worry about is that it’ll make measuring defense too easy.” If everyone has access to the same excellent metrics, in other words, there’s very little opportunity to gain a competitive edge.

The lesson? When looking for an edge, don’t look to areas that are commoditized. Focus instead on areas where it’s difficult to measure. Even if you do it poorly, the odds are that you’ll still have better intelligence than your competitor who’s not looking there at all.

“Emotion Dooms Analytics”

Paraag Marathe, the San Francisco 49ers’ Executive Vice President of Football & Business Operations, said that, and he’s right. It’s very difficult to make good business decisions if you’re making them emotionally.

The lesson? Leave that to your competitors. Make the best decisions you can based on actual data. The Boston Red Sox have made some wrenching emotional decisions the past decade, after eighty some odd years of courting fan sentiment. The results? Two World Series titles.

Consider Context

Aaron Schatz, the Editor in Chief of Football Outsiders, discussed the draft valuations of SEC running backs versus Big 10 running backs. And while he can’t prove it yet, he has a working hypothesis which asserts that running backs from the SEC tend to be undervalued in the draft, while those from the Big 10 tend to be overvalued. Why? Because of context. Film is a huge component of the scouting process in the NFL, but it’s difficult to account for the relative differences in league, from average offensive and defensive line size and weight, offensive schemes, and more. Big Ten running backs seem to look better than they actually are; SEC backs, worse.

The lesson? Context matters a lot. Try to consider not just the data, but where it came from, and how it might potentially be biased. Then use the data to weight and adjust for those biases.

If You’re Not Using Analytics, Your Competitor Will

John Dewan, the Owner of Baseball Info Solutions, summed up the differences between baseball teams diplomatically: “Not every team appreciates the value of defense equally.” Those that do, the evidence suggests, have a significant advantage over those that don’t.

The lesson? If you’re not using analytics in all areas of your organization, you can be sure that your competitor will be. Which will be his advantage and your handicap.

It’s As Much About What Data You Don’t Present as What You Do

Just using publicly available data – forget all the extra proprietary information the clubs collect – I could tell you what the batting average is against Josh Beckett’s two seam fastball located in the bottom half of the zone in the third inning in day games at home against hitters in the bottom half of the order.

Does that actually help anyone? Probably not. The challenge, with sports as every other business we speak with, isn’t too little data, generally. It’s too much. The question becomes how you determine what to present and what not to.

The lesson? Don’t overwhelm with statistics. Work backwards from what you want to know, or might want to know: that will inform your choice of data.

Speak English

Indianapolis Colts president Bill Polian: “Speak English, please.” Bill Simmons, ESPN Columnist: “For stats to make it to the next level, they’ll have to be able to relate to everyone, not just people with Math degrees.” Statistics and analytical professionals, like people in a variety of specialized disciplines, can sometimes forget that not everyone speaks their language. If I told you that Jon Lester was a 5.6 WAR player last year, does that mean anything to you? It would if you’re a baseball nerd; for everyone else that’s just Greek. As Tom Tippet, the Red Sox Director of Baseball Infomation Services put it,”There are a lot of people in baseball operations that don’t have degrees in math or get how these work. The challenge is to make it usable.”

The lesson? Duh: speak English. Abstract the terminology where you can and attempt to explain in practical terms what your analytics actually mean. If you can’t communicate that, it really doesn’t matter how good your data is.

Making Unpopular Decisions

The question I asked the baseball analytics panel was, at its essence, pretty simple: how do you handle making unpopular decisions that are nevertheless the right decision to make, according to the data? Bill Belichick, for example, was widely excoriated by Patriots fans this past season for going for it on 4th down against the Colts and failing, ultimately losing the game. Why is this interesting? Because as Schatz discussed, the data said that was unquestionably the right decision to make. More, Bill Polian, the General Manager of the Colts – the team that benefitted from the failure – agreed.

The lesson? Some decisions will be less popular than others, invariably. The key is to keep the big picture firmly in view, because if you’re making the right decisions consistently, you should win. And if you win, everyone forgets the unpopular decisions.

Measure Everything

Because throwing a ball overhand is, in a biomechanical sense, an unnatural motion, pitchers – particularly young ones – are a serious injury risk. In an effort to keep them healthy, teams are increasingly employing a variety of statistical measures – both general and individualized – to build training and throwing programs designed to maximize their health. Key to this is data: having orthopedic data on stresses to the motion generally, to delivery types more specifically, and finally to an individual athlete. Observational surveys are being conducted which accumulate more and more data on who got hurt, when, and how, from which we can attempt to extract patterns of injury and thus identify potential risks.

The lesson? Measure absolutely everything you can. You will not be able to anticipate what you might need data on, so collecting as much as you can in advance is likely to be your hedge against such future needs.

The Challenge of Integration

Data is almost always more valuable than it is in a vacuum. The Red Sox, for example, have constructed essentially a single database for players – their own and other clubs – that incorporates just about anything you could want to know about a player. Scounting reports, performance data, video, contract status: everything. Because while it’s nice to know who the best hitting outfielders are, it’s even better to know – per their contract status – which ones are available and which ones aren’t.

The lesson? Look for opportunities to integrate all of your data. Not just for the convenience of a single repository, but because the sum of data is usually more than its component parts. Potentially far more.

by-nc-sa

del.icio.us, Thank You : Pinboard, Welcome

Really, all I wanted to do yesterday morning was find a copy of the del.icio.us Chrome extension I was used to on my old machine. Paging through the Chrome extensions repository, however, all I turned up were a bunch of glorified bookmarklets. Eventually I turned up the “official” Chrome extensions Google Group. The last update for which was August 6th, 2009.

That moment is, more or less, why I’m leaving del.icio.us. The del.icio.us part of it, anyway.

My history with the del.icio.us service is long. I’ve been using the tool since 2004 or so, and in that time I put 7102 bookmarks up there. With notes and tabs included, the export came to 2.2 MB. Which doesn’t sound like much, until you consider that it’s just links. We used to test browsers, actually, by trying to load my del.icio.us bookmarks page: before they used to page things, it would crash pretty much all of them. Ah, the old days of the web.

On one occasion, at the kind invitation of its creator, Josh Schachter, I had the opportunity to visit the del.icio.us NYC HQ, prior to its acquisition by Yahoo and subsequent transition to California. Small office, small team, rag tag architecture – one box of which we helped acquire for them – and yet a product superior, I think, to what we have today.

To say that I have a personal affection for the service, then, is understating things.

But with Josh now departed, less than enthused about the direction Yahoo has pursued with del.icio.us, that attachment has waned. Like some other Yahoo properties like Flickr, the social bookmarking service has seemed to get but little attention from its Yahoo parent in recent years. Innovations have been few and far between, and where they have tried to update things – most notably with the UI – I haven’t appreciated those changes. Periodically, features like the blogging autoposting have broken, to be repaired eventually.

When I looked at that Chrome extension last updated in August, then, I had to ask myself whether I really had confidence that del.icio.us was the right tool for my needs going forward. The reluctant answer was that it was not.

Which left the obvious question: what would I replace it with, and how?

While I briefly considered more robust solutions like Evernote as well as radically simpler approaches like a dedicated Identi.ca, Posterous or Tumblr account, I’ve instead decided to proceed with Pinboard.in. Pinboard has its own views of why you might prefer Pinboard to del.icio.us, as well – interestingly – as why you might prefer del.icio.us to Pinboard.

But here’s why I switched.

  • Triangulation:
    Here’s how I’ve described this before:

    The concept is simple: a single pointer to a new technology, service or whatever – even from a trusted source – is likely to have minimal impact, particularly if it requires effort to explore. But the second notice, from a trusted party, triggers a little click of recognition, and is far more likely to register. Further mentions only escalate this, until the interest to skepticism ratio tilts in favor of a trial.

    Pinboard’s been on my radar for a while, but the more reviews I read like Nat’s, the better I felt about it as a potential replacement. Even one that I didn’t know I was looking for.

    I originally discovered del.icio.us through alpha geek triangulation, so it seems only fitting that its replacement be discovered in similar fashion.

  • Twitter:
    My linking behavior has changed significantly since the introduction of Twitter. As I’ve grown to use that service more, there is a certain class of links that I was posting to Twitter but not to del.icio.us and vice versa, meaning that there was no longer a single service that collected everything I pointed to.

    Pinboard, unlike del.icio.us, has a feature that monitors my Twitter feed and will automatically collect the links I post there. Problem solved.

  • Speed:
    Speed, remember, is a feature. One Pinboard has in spades: the UI is just fast. No other way to describe it.
  • Link by Mail:
    Speaking of Twitter, one of the use cases that’s been bothering me has been stories that I read via Twitter but on my iPhone. Invariably, someone will link to something on Twitter, which I’ll read about on my iPhone using Tweetie, and it’s long enough that I don’t want to read it on the small screen. My solution to date? Mail it to myself. Which is a poor idea, since I’m no more likely to read my own email than anyone else’s.

    Pinboard, meanwhile, provides me with a Posterous-like email address from which I can mail the story in to be automatically collected. Simple, but very nice feature.

  • Other Sources:
    If I want to bookmark or otherwise mark links via other services such as Read It Later or Google Reader, Pinboard will automatically collect those as well. It will even monitor del.icio.us in case I wanted to use them in parallel.
  • del.icio.us Import:
    Speaking of del.icio.us, one of the obstacles to leaving the service was the content I’d amassed there. But to their credit, del.icio.us allows you to bulk export your entire archive. Pinboard, meanwhile, allows you to import your entire history, so in a minute or two I was up and running on Pinboard as if I’d been using it since 2004.
  • Export:
    And speaking of export, Pinboard allows me to export everything I put it, via the same format as del.icio.us or simply RSS.

What about the downsides? Well, there are two from what I can tell.

  1. Autopost:
    Unlike del.icio.us, Pinboard does not natively support the collection and publishing of links nightly to a blog. While I’ve fallen out of this practice lately, I do still use it from time to time and some folks do appreciate them, so the lack of it is a potential issue. That said, it shouldn’t be too difficult to aggregate the links and publish them if I like, or I could move – as a lot of folks have – to a more manual, potentially higher value links post.
  2. Price:
    For some, this is undoubtedly a concern: Pinboard is not free. As its creators describe the pricing:

    Users pay a one-time signup fee that goes up by a small amount with each new signup. In return, they get a fast, spam-free service and prompt support.

    It’s about $6 at present, though I paid for the $25 service which fully archives all the pages I link to.

    I certainly respect the rights of users to advantage free services over those that are paid. For my purposes, however, I am happy to pay individual developers to help fund services that are useful to me: I offered the same to Josh when he was launching del.icio.us.

    I also find the pricing model interesting, in that it attempts to align user psychology with the costs of scaling. In the beginning, a low price acts to throttle drive by, low value users but keeps the barriers to purchase reasonable for legitimate early adopters. As the volume of users grows, however, and the costs of scaling increase, the pricing escalates both to offset the increased costs as well as keep adoption at manageable levels. It’s not clear how the model itself will scale over time, but it’s creative, and I give them credit for that.

Will I stick with Pinboard as long as I have with del.icio.us? Or will I revert to that service? Should you switch?

Who knows. Personally, I’m quite happy with Pinboard and recommend it if you’re in the market for a (new) bookmarking service. If you’re happy with del.icio.us, by all means stay.

In the meantime, if you’re interested in what I’m reading, check me out over at Pinboard. If you’d prefer to subscribe in a reader, the feed can be found here.

by-nc-sa