tecosystems

Beyond Cassandra: Facebook, Twitter and the Future of Development

By Stephen O'Grady | @sogrady | May 17, 2010

Whenever an internet company elects to build its own database, web server or language framework (coverage), the inevitable result is a discussion of the relative merits of the new technologies versus those not chosen. The canonical example here are the endless comparisons of NoSQL infrastructure to the more traditional relational database approach (coverage). However interesting such conversations might be, they’re obscuring the longer term implications of a fundamental shift in the way that software is produced, and why.

Historically, businesses – web and otherwise – have used a variety of mechanisms to protect their assets, to preserve their competitive advantage. In the software world, we’ve seen copyright, licensing, non-disclosure agreements, patents, trademarks, and a host of other legal tools employed. If you’ve been in the industry for any length of time, chances are you’ve been on one end or the other of one or all of the above at some point.

In the past ten or fifteen years or so, however, we’ve seen software firms increasingly ask a simple but profound question: what are the assets I must protect? Twenty or more years ago, the answer was simple: protect everything. Ten years ago, as open source was on the way to becoming a mainstream software development practice and companies built upon the resulting projects grew exponentially in size, the reply was more nuanced. A lot of software needed to be protected, but there were substantial chunks that could be shared. Today, firms appear to be asking a different question: is my value in data, or source code? And if the answer is data, what should my software development practices look like?

Facebook and Twitter, as high profile properties that grew up without the legacy protectionist mindset, might be illustrative here.

If we examine developers.facebook.com/opensource/, for example, there are several very obvious trends.

Facebook has a strong preference for permissive licensing. Wherever possible, Facebook avoids copyleft licensing in favor of more liberal alternatives. The default licensing choice, in fact, appears to be Apache 2.0 (e.g. Cassandra, Scribe, Tornado and Thrift), with other licenses employed tactically for compliance or compatability reasons (e.g. GPL and Flashcache or PHP and Hip-Hop).
Between their contributions to previously existing projects (e.g. Hadoop, Cfengine, memcached, MySQL, and PHP) and releases of software they built (e.g. Cassandra, Hip-Hop, Hive, Scribe, Tornado, Thrift) the core of Facebook’s infrastructure is built on non-differentiating, publicly available code (Update: ~~just for reference, we’re told via email that Facebook, “no longer contributes to nor uses Cassandra.”~~ Update 2: we are now being told – and Facebook has confirmed – that Cassandra is actually still employed by the company for, among other things, Inbox Search.)
Language usage at Facebook is fairly heterogeneous, with both dynamic languages (e.g. Javascript, PHP, Python) represented as well as traditional alternatives (e.g. C, C++, Java). Perhaps because of Facebook’s emphasis on performance, however, the latter is significantly more common than the former.
Facebook hosts very few of their own assets; Tornado appears to be the notable exception (possibly bc it came from FriendFeed). Some assets are hosted with Github (coverage), those that are not are typically housed at Apache.

As for Twitter’s twitter.com/about/opensource:

Twitter, like Facebook, has an affinity for permissive licenses in general and the Apache license specifically. Twurl, a Twitter-specific flavor of Curl, is MIT licensed, but FlockDB, Gizzard, Murder and even its GC trace script jvm-gc-stats are Apache 2.0 licensed.
By all accounts, Twitter is run off of an infrastructure similarly undifferentiated. Its primary data storage, for example, has been MySQL based with a parallel implementation of Cassandra (which Twitter contributes to). Their social features are likewise enabled via a graph database, FlockDB, whose source is available.
Languages at Twitter are similarly heterogeneous, though Twitter appears to rely more heavily on dynamic languages than does Facebook (Murder is 97% Python / 3% Ruby, for example) resorting to Scala when performance is at a premium (FlockDB is 83% Scala, Gizzard 100%).
Effectively zero of Twitter’s released open source projects are self-hosted; Twitter has insteaded outsourced this task to Github. There does not appear to be any predisposition to existing open source foundations, Apache or otherwise.

Though Facebook and Twitter clearly have some differentiation in their operational priorities and philosophies, then, the similarities far outweigh the differences. Following on the heels of Amazon, Google, Yahoo and the other early web firms, Facebook, Twitter et al are pushing the envelope even further: Google publishes their algorithms (e.g. MapReduce), Facebook their software (e.g. Cassandra).

If they are at all representative of the direction of application development in web native firms, then, we might reasonably expect the following:

Default to Open Source:
Rather than ask whether a given asset should be open source, firms are likely to increasingly try to identify which pieces should not be. We don’t see many businesses running off of an entirely open source foundation, but the differentiation points are typically further up the stack. In practical terms, then, this means that it will be difficult to differentiate, competitively, on infrastructure software. And if there is no competitive advantage in your infrastructure, the benefits to using or releasing open source software – whether those are better resource availability or the ability to amortize development costs – are likely to outweigh the marginal benefit of developing it strictly in house.
Language Heterogeneity:
Traditional development best practices – which typically annoint a language or set of languages as the permitted options – will likely become workload specific. Performance or scale sensitive applications, for example, would be restricted to a set of predetermined language options (e.g. C/C++ at Facebook, Scala at Twitter, etc). Glue languages, however, are likely to be far less homogenous, and reflective of different influencers (e.g. developer preferences, available bindings/libraries, etc).
No Core Competencies in Project Hosting:
Few if any web firms are attempting to specialize in project hosting. This task is increasingly being left to specialized hosts (e.g. Github) or governance oriented foundations (e.g. Apache). This is preferable from a developer standpoint, because centralized project hosting simplifies discovery/cross-pollination and enables network effects such as social, collaborative development. Source code control is increasingly likely to be distributed by default, as well.
Permissive Licensing Standard:
Much has been made in some quarters over the decline of the GPL. While the “decline” is unquestionably overstated (coverage) considering that the license is more popular than the next ten licenses combined, my expectation entering 2010 was that permissive licensing would continue to grow at the expense of reciprocal licensing (coverage). The behavior of web firms generally validates this assertion.
Precise Identification of Value:
It would be absurd to argue that the value of a Twitter was in no part related to the software that powers it. But it would be equally foolish to suggest, when we have open source Twitter clones such as StatusNet freely available, that the value was all, or even mostly, in the software. The value of a Facebook or a Twitter is ultimately in the data they generate, not the code. In a very real sense, its users are its asset. When application development is considered, then, it will be considered with this in mind. If code isn’t ultimately your differentiating asset, then the dynamics of development are irrevocably altered.

Many of you are doubtless curious as to how relevant the application development experiences of unique, web native businesses such as Facebook and Twitter are to traditional enterprise customers. The answer depends largely on timeframe. In the short term, the impact will be minimal both because enterprises move slowly and because their attention to web firms is fairly minimal. In the longer term, however, the web firms have the ability to substantially influence developer best practices, product direction and so on. Witness the mainstream popularity within enterprises today of dynamic languages, once popularized by web firms, or the accelerating adoption of projects such as Hadoop.

We will not see within the foreseeable future a world in which all software is open source, nor one in which there is no differentiation to be found in development. It is likely, however, that as our understanding and appreciation of what, precisely, is differentiating improves, our software development practices will evolve along with it. Where better to look, then, to understand where things are going than to firms that have grown up without preconceived notions of what must be protected at all costs?

17 comments

Martijn Linssen says:

May 27, 2010 at 8:26 am

Steve,

great inventory! But I miss your point, I guess. I see 2 greenfield companies that built something from scratch with as many free tools as were available. I see opportunism, a lack of strategy and focus, way too much diversity, next to a weighed choice of tools-for-the-job and outsourcing / cloudsourcing where apt

But. Businesses have always been about the data, never about software development. Software is one on those tricky things that isn’t core business to a firm, but business critical. So we’ve been dancing around how to develop software, for decades now

What I see when looking at Facebook and Twitter is a very opportune choice and use of tools that are up-for-grabs. I’d love to see their 5-year roadmap on how all those different languages and tools are going to help support their case, strengthen their competitive position, etc. etc. etc…

Reply
Messylaneous for 2010/05/27: destroying flash, Unix, programming · DragonFly BSD Digest says:

May 27, 2010 at 10:29 pm

[…] Lavigne linked to this article about the future of software development, and I agree with her: it’s a good prediction of the very near future. Categories […]

Reply
tecosystems » The Future of Development: It’s About Differentiation says:

May 28, 2010 at 8:11 am

[…] Martin Linssen, commenting on “Beyond Cassandra: Facebook, Twitter and the Future of […]

Reply
When you should open-source your internal apps « IM Ninja Dude says:

June 4, 2010 at 7:43 am

[…] IT departments should revisit their application development strategies to follow some of the approaches used by Facebook and Twitter, argues RedMonk analyst Stephen […]

Reply
When you should open-source your internal apps says:

June 4, 2010 at 7:57 am

[…] IT departments should revisit their application development strategies to follow some of the approaches used by Facebook and Twitter, argues RedMonk analyst Stephen […]

Reply
When you should open-source your internal apps | Florida, Technology Information, Reviews, News, Software Downloads and More says:

June 4, 2010 at 3:19 pm

[…] IT departments should revisit their application development strategies to follow some of the approaches used by Facebook and Twitter, argues RedMonk analyst Stephen […]

Reply
Ned Wolpert says:

June 9, 2010 at 6:02 pm

I was wondering about the updated note saying the Facebook does not use Cassandra anymore. Does that mean that they don’t use what they originally built at any level, or only that they don’t have anything to do with the Apache Cassandra project but still use their internally developed one at Facebook. I’d like to know what the reason Facebook had for not using Cassandra anymore.

Reply
tecosystems » The Economics of Open Source: Why the Billion Dollar Barrier is Irrelevant says:

June 21, 2010 at 4:27 pm

[…] clear that open source as a development model for non-differentiating software is gaining steam (coverage). The trend is most obvious in web native players collaboratively developing their infrastructure […]

Reply
Algunos apuntes sobre NoSql says:

June 28, 2010 at 7:02 am

[…] Facebook y Twitter, la adopción de Cassandra. […]

Reply
tecosystems » Open Core is the New Dual Licensing says:

June 30, 2010 at 10:56 pm

[…] Pure play open source software is today, and always has been, more popular. Which is why I categorically reject arguments that assert that open core-style mechanisms are necessary for the production of open source software. That they are beneficial from a revenue standpoint to organizations built to monetize open source software is indisputable; but open source is, of course, not strictly or even primarily authored for purposes of sale (coverage). […]

Reply
tecosystems » Frictionless Computing: What it Means for Infrastructure says:

July 7, 2010 at 3:35 pm

[…] and Twitter are increasingly leveraging open source as a primary, mainstream development path (coverage) – spiking the volume of available code – and there has never been a better time to be […]

Reply
Resultado de ALT.NET Hispano VAN sobre NoSQL - Angel "Java" Lopez says:

July 17, 2010 at 12:25 pm

[…] Tutorial: Getting started with Cassandra NoSQL in Twitter Twitter, Facebook and Cassandra, and Open Source Cassandra and Twitter: interview with Ryan King (Twitter está reviendo si usa NoSQL o […]

Reply
NoSQL Resources « Angel “Java” Lopez on Blog says:

July 19, 2010 at 6:01 am

[…] Tutorial: Getting started with Cassandra NoSQL in Twitter Twitter, Facebook and Cassandra, and Open Source Cassandra and Twitter: interview with Ryan King (Twitter is reviewing if they use or not […]

Reply
Algunos apuntes sobre NoSql - maxilovera.com.ar says:

September 6, 2010 at 6:29 pm

[…] Facebook y Twitter, la adopción de Cassandra. […]

Reply
Why There Won’t Be a LAMP For Big Data – tecosystems says:

October 2, 2010 at 9:15 pm

[…] you look at web properties such as Facebook, LinkedIn and Twitter, this is evident [coverage]. Portions of their Big Data workload are serviced by Hadoop implementations, while others are […]

Reply
Raodeve says:

May 21, 2012 at 4:48 am

giving same junk in new box is what this article is

Reply
How Important of Software? | orlhyburgos says:

October 12, 2015 at 2:46 am

[…] to release key pieces of their existing infrastructure as open source projects – e.g. Cassandra [coverage], Hive, Hip-Hop [coverage] or Thrift – there is the fact that Facebook is built, effectively, on […]

Reply

17 comments

Martijn Linssen says:

Messylaneous for 2010/05/27: destroying flash, Unix, programming · DragonFly BSD Digest says:

tecosystems » The Future of Development: It’s About Differentiation says:

When you should open-source your internal apps « IM Ninja Dude says:

When you should open-source your internal apps says:

When you should open-source your internal apps | Florida, Technology Information, Reviews, News, Software Downloads and More says:

Ned Wolpert says:

tecosystems » The Economics of Open Source: Why the Billion Dollar Barrier is Irrelevant says:

Algunos apuntes sobre NoSql says:

tecosystems » Open Core is the New Dual Licensing says:

tecosystems » Frictionless Computing: What it Means for Infrastructure says:

Resultado de ALT.NET Hispano VAN sobre NoSQL - Angel "Java" Lopez says:

NoSQL Resources « Angel “Java” Lopez on Blog says:

Algunos apuntes sobre NoSql - maxilovera.com.ar says:

Why There Won’t Be a LAMP For Big Data – tecosystems says:

Raodeve says:

How Important of Software? | orlhyburgos says:

Leave a Reply Cancel reply

About

The Book

Subscribe to Blog via Email

Recent Comments

Archives

About

Newsletter

The Book

Search

Recent Posts

Recent Comments

Categories

Archives