Blogs

Redmonk

links for 2010-02-06

by-nc-sa

Facebook Rolls Their Own PHP: What HipHop Means

It is marginally interesting that Facebook continues to open source its technology. Anybody surprised by the fact that Facebook is producing software, however, simply hasn’t been paying attention. Facebook is, and always has been, a software company. Ask Savio.

The four hundred million-ish people Facebook provides with social networking functionality are a byproduct of their software, not the reverse. The sooner that vendors realize this, the better, because Facebook will be the least of their problems.

Take HipHop for PHP. That’s a disaster for Zend, right? Hardly. Compiled dynamic languages are, in technical terms, old hat. Advocates of runtimes like the JVM or .NET have been touting the advantages of managed, compiled equivalents to their interpreted counterparts for years. As Haiping Zhao, the author of HipHop, acknowledged:

Even compiling PHP isn’t a new idea, open source projects like Roadsend and phc compile PHP to C, Quercus compiles PHP to Java, and Phalanger compiles PHP to .Net.

Compile your dynamic language, we’re told, and it will be more secure. More manageable. More performant. More better.

What they leave out is that it will be more harder, too.

To contend that a better performing PHP – even one that offers, as HipHop reportedly does, a 50% improvement on CPU usage – will negatively impact usage of the non-compiled alternative misses the point of PHP, which has little to do with performance. PHP has never been the first choice for high scale environments; it is, in many respects, why caching systems from memcached to WP-Supercache exist, and have evolved so quickly. PHP owes its ascension and current ubiquity to a variety of factors, accessibility included, but its staying power is as much a function of the ecosystem as the technology. Which is the problem all of the compiled dynamic language alternatives contend with.

They’re like the parent language, but they’re not the parent language. Run into trouble? Unlike PHP, where you’re guaranteed that Google will turn up someone who’s had and addressed the exact problem that you have, it’s possible – probable, even – that on the compiled alternative, you’re the first to run into it. Translation: good luck with that, because you’re on your own.

Even if we assumed, counterfactually, that HipHop was a perfect translation of PHP, how many of the traditional PHP users do you imagine would even notice a 50% reduction in their CPU usage? How about a 50X improvement? How many do you think even know what their current CPU usage is, versus those that heard, at some point, from someone, their website runs on something called “WordPad? TextPress? you know, that thing?”

My guess is the percentage of users who would notice a 50% bump are a fraction of those who wouldn’t, and not a big fraction at that.

Which is not to say that the technology is uninteresting, my instinctive mistrust of code generation functionality aside. First, because Zend has clearly carved itself out a nice business optimizing PHP performance for those who do care, and would notice a 50% reduction in CPU usage. And more importantly because those that do care about CPU usage are likely, like Facebook, to care quite a lot. As they should, when they’re serving 400 billion PHP-based page views…a month. Fifty percent less CPU on that workload means a lot less dollars, and lord knows how much less carbon emitted. It also means more resources for other compute tasks. Fifty percent better performance is transformative, in other words, if you’re the type of customer that measures performance in the first place.

Which leaves an obvious question: that type of customer, the type that would find fifty percent better performance interesting, tends to be relatively well capitalized and therefore predisposed to commercial support. But who will offer commercial services around Facebook’s HipHop?

Not Facebook, presumably. While they are every inch a software company in my book, they are absolutely not a sofware company involved in licensing, selling and supporting software. They’ve got four hundred million-ish other distractions, remember. Which leaves who? Savio’s quick to identify the problems for potential third party support, saying:

However, without control of the project and the project’s copyright and trademark, it’s difficult to monetize usage.

While those are issues indeed, I think the evidence suggests that these impediments are no obstacle to commercial opportunity.

Percona doesn’t control MySQL’s copyright or trademark, yet is a going commercial concern. Likewise CentOS, whose commercial ecosystem leverages Red Hat’s product while sidestepping the cited trademark concerns. As does Oracle, on a larger and more explicit scale. Even in the dynamic language space we have an example of a similar economic model playing out successfully: ActiveState’s primary revenue derives from sales of QA’d and supported runtimes for the likes of Perl and TCL.

History indicates, then, that HipHop could easily be supported, on a for profit basis, by a third party. Who might the logical candidate be? Well, why not Zend? They know the runtime as well or better than any other third party, leaving them presumably well positioned to understand how the code can – and can’t – be translated to C++. They have existing support agreements in place with many of the potential HipHop users. So what if they didn’t author the project? The market is littered with companies supporting products written by others.

Which is not to say Zend will support the project, of course – I haven’t spoken to them yet on the subject. Merely that they could, and should in my view at least consider it, depending on the current quality of the PHP==>C++ conversion.

Even if it is no threat to Zend, however, HipHop should be a warning shot across the bows of a great many software vendors. HipHop, by design, is aimed at a narrow, if high margin, section of the market in question. As a vendor, I’m not losing sleep over open source projects like HipHop which are really attractive only to the top 1% or so of customers. No, my concern would be the potential release of something that’s relevant to, say, 50% of the market. What if Facebook open-sourced Project Titan, for example?

Customers, more than ever before, are going to be your competitors. Prepare accordingly.

Disclosure: Zend is a RedMonk customer, Facebook is not.

by-nc-sa

Netbook Applications: The Bare Necessities

The more time I spend crunching numbers on even moderately sized datasets, the more frustrating my current hardware setup becomes. Things are desperate enough that my Thinkpad X301, which maxes out at a mere 4 GB of RAM, is becoming a better performing alternative to my aging, dying Sun Ultra 20 workstation. How sad is that? But at least I have a schedule to address that situation: a crazy tricked out workstation is – at least in theory – on the way.

With my workstation needs in all likelihood addressed, the big remaining question for me is what’s at the opposite end of the spectrum. Back in December, I discussed the fact that I was transitioning from a general purpose laptop only model to one characterized by more specialized hardware. Big, monstrous workstation for analytics, virtualization, testing and the like, complemented with something much smaller and lighter for all the travel. Something netbook or smartbook-like, in other words.

Something that probably won’t run most or all of the applications I’m used to relying on, with the notable exception of a browser.

When I first began looking at Moblin a while back, people told me that it was nice, but that I’d miss the standard application set of my operating system of choice, Ubuntu. Interestingly, the trend in the space is actually towards even fewer applications than Moblin allows; the Lenovo Skylight, one intriguing option, is preloaded not with a standard application set, but with widgets running on top of a thin Linux film. Chrome OS, of course, goes even further, dispensing with the idea of applications entirely and pushing a browser only experience.

True, there’s the iPad, but that’s more or less a non-starter because I require a physical keyboard to be even remotely useful. The touchscreen keyboard works on my iPhone because that’s a device I use for reading, not writing: whatever I end up taking on the road will need to be equally comfortable with both.

The good news is that my application needs are actually relatively few. I have no intention to ask a netbook to handle a general purpose laptop’s workload. Things like a media player (Banshee) or virtualization platform (VirtualBox) that I use now are certainly not must haves in a mobile device. I’m more reluctant, however, to give up Emacs. Mark Pilgrim is aggressively agnostic when it come to the choice of text editors (not to mention frustrated with the recent crops of new writing tools), saying:

Picking the right text editor will not make you a better writer. Writing will make you a better writer.

Which is true. But I’m not looking for a text editor to make me a better writer. I’m looking for a text editor to make the task of writing more enjoyable. Easier. Simpler. Less complicated. And so on. Emacs does that for me, mostly, though I need to spend some time looking up how to make writing HTML natively simpler. Google Docs, on the other hand, does match the Emacs authoring experience.

Could be I’ll find a browser based text editor I like as much or more than Emacs: stranger things have certainly happened. But I’m not counting on it, and I’m not looking forward to it. Ymacs is interesting, but not the same. Bespin doesn’t have Etherpad’s zero latency. And Etherpad is just plain going away, unless you want to host your own Scala codebase.

The other missing piece for me will be a terminal (no, I don’t use Emacs for this, generally). True, on something like Chrome OS that abstracts the underlying operating system away, it’s kind of pointless for local work. But I’ll still need to spend a fair amount of time adminstering remote servers. And do I really want to SSH into my servers from a third party, browser based SaaS terminal service. No I do not.

None of this means that I’ll miss the applications enough to forgo the hardware form factor. The chances are excellent that by the end of the second quarter at the latest, you’ll spot me at a conference touting some kind of new, lightweight device. But the transition is going to be interesting. While I don’t use all that many applications in general, the ones I use, I use a lot.

So get to work recreating Emacs and a terminal in the browser, will you? You’ve got until the end of Q2.

by-nc-sa

Ubiquitous Analytics and Tableau Public

Tableau Public

Well, that didn’t take long. One of my 2010 predictions foretold ubiquitous analytics, and here we are not even two months in and that’s already about to happen in the form of Tableau Public.

For those of you unfamiliar with the vendor, Tableau’s a bit of a midpoint product: more statistically and visualization oriented than Excel, more accessible in usage and cost than high end BI alternatives like Business Objects, Cognos or SAS.

Where vendors like Data Applied, GoodData and so on are taking a SaaS approach to analytics and visualization, Tableau is primarily a rich client experience. Sadly for this Ubuntu using analyst, it’s Windows only, but apart from that I have few complaints. As you’d expect from a rich client vendor, there are the standard Desktop and Server product offerings.

And then there’s Tableau Public.

Still in beta, but slated for near term public availability, Tableau Public is essentially a desktop analytics experience coupled to a web back end that handles hosting of the dataset and display of the visualization. Tableau Public is a lot like their standard desktop client, except that you can handle only a hundred thousand rows of data and that you can save only – publicly – to the web. Hence the Tableau Public.

If you’re familiar with IBM’s Many Eyes, Tableau Public is very similar except that it’s got a rich client editor and that the embeddable analytics don’t require a Java enabled browser. Which means that you can share your visualizations even with users of Chromium.

The client you can see above. But what can Tableau Public actually do? A lot more than I can show, I’m sure, in part because I’m new to the whole visualization game, but more because I’ve only been using it for a few days. But note a few things as we run through some examples:

  1. The displays are rendered via Ajax. No Flash, no Java.
  2. You can view and/or download the data visualized.
  3. You can interact with the data in context.

First, a simple cross tab visualization of some Maine population data, which I obtained here.

Nothing fancy. Sure, it’s better than the comma separated value file served up by the State of Maine, but otherwise it just looks like a table. Which it is. But try clicking a row, then clicking the funnel icon at the bottom and selecting Keep-Only. The ability to work on data, or download it for offline manipulation, is potentially big.

Next is one that’s hopefully slightly more interesting: a visualization of the moose related crashes from 2006-2008 (that one’s pulled from here).

This one’s significantly more valuable than a CSV datafile, unquestionably, because it’s been married to map data. At the suggestion of my better half, I layered in the streets and highways, so while they’re faint it’s easy to see the (expected) correlation between highway corridors and moose related accidents. You can also use the arrow in the bottom toolbar to pan and zoom, if you’re interested in a higher level of detail.

Last we have a chart you’ve likely seen before, here, which has the sad duty of depicting deaths from the most deadly earthquakes globally through 2008 (data can be found here).

The difference between the above and the images I wrote up last time is probably obvious: interactivity. A static image can only tell you so much: the embeds above will not only link you directly to the data I used, but give you the opportunity to ask questions of the visualization. How many have died in the US, for example? Malaysia?

Clearly all of the above are but the most simplistic of visualizations; the NY Times “A Peek into NetFlix Queues,” this is not.

But that’s what’s excellent about public analytics: if you’re so inclined, you can pull down the datasets and better my primitive efforts. Which is how we’ll all learn. I look forward to a world where it’s as easy to share, comment on, and improve analytical visualizations as it is YouTube Video.

And Tableau Public clearly is helping advance that cause.

A few issues, apart from the Windows only nature:

  • Tableau can be intimidating to new users. Templates and other “Getting Started” features would help a lot.
  • Logging in and posting / retrieving data is a bit cumbersome, because I need to login in for each dataset I’m working off of, rather than once.
  • It can be extremely picky with data, and it’s not always obvious which data’s causing the problem. It took me a couple of cycles through to learn that Tableau expects the countryname “Russian Federation” in spite of the fact that the map background says Russia.
  • I haven’t seen – yet – the public back end, so it’s not clear how easy it will be to find new visualizations and collaborate with others on improvements.
  • If there’s a way to annotate the visualization with notes, thoughts, original sources for the data, I haven’t found it yet.

But overall, it’s a very nice – and very interesting tool: one that I expect we’ll be using quite a bit of around here at RedMonk HQ. I’m heading down the R route as well, but Tableau is significantly more accessible than everyone’s favorite statistics language.

Disclosure: Tableau is not a RedMonk client, but granted me early access to the beta and a trial license for the desktop product.

by-nc-sa

links for 2010-01-29

  • "If you pay any attention to the endless debates over intellectual property policy in the United States, you'll hear two numbers invoked over and over again, like the stuttering chorus of some Philip Glass opera: 750,000 and $200 to $250 billion. The first is the number of U.S. jobs supposedly lost to intellectual property theft; the second is the annual dollar cost of IP infringement to the U.S. economy. These statistics are brandished like a talisman each time Congress is asked to step up enforcement to protect the ever-beleaguered U.S. content industry. And both, as far as an extended investigation by Ars Technica has been able to determine, are utterly bogus." – sadly, i couldn't be less surprised
  • i'm torn. i have to admit that my affection for the open web is warring with my pragmatism.
  • "Let's be clear: It's fine to say that Flash is flawed; it is. (You know who'd agree? The Flash team.) It's fine to hope for alternatives to take root. (Competition makes everyone better.) But let's also be honest and say that Flash is the reason we all have fast, reliable, ubiquitous online video today. It's the reason that YouTube took off & video consumption exploded four years ago. It's the reason we have Hulu, Vimeo, and all the rest–and the reason that people now watch billions of videos per day (and nearly 10 hours apiece per month) online. Without it, we'd all still be bumbling along." – love or hate Flash, and i fall into neither bucket, it should be given its due. compete with it, criticize it for not being open/standards/etc enough, all fine.

    but arguing that it hasn't been important is silly.

by-nc-sa