Blogs

RedMonk

Skip to content

You Won’t Get Fired for Using Apache

Git, Svn and CVS Usage on Debian

In March of 2010, I sat on a panel with Justin Erenkrantz (Apache), Mårten Mickos (Eucalyptus), and Jason van Zyl (Maven/Sonatype) at the Eclipse Conference debating the future of open source [coverage]. The audience asked questions on licensing, development models and the direction of open source generally. One of the questions concerned the role of foundations like Eclipse, and whether they represented the future or if that would be written instead by commercial producers of open source.

My answer to that question was simple: neither. GitHub, instead, is the shape of things to come.

Mikeal Rogers came to a similar conclusion in Apache Considered Harmful. His arguments copncerning the importance of GitHub are compelling, as the evidence in favor of decentralized version control generally and Git/GitHub specifically is overwhelming. Quantitatively, virtually all of the metrics available to us reflect trajectories similar to those reflecting the relative popularity amongst Debian installations depicted above. As Chris Aniszczyk documents, legacy centralized tools reflect volume usage but Git’s massive growth has it poised to eclipse them. Git based services, meanwhile, are both profiting from and fueling said growth; an analysis of first half commit data for four major forges collected by Black Duck indicates that GitHub has in three years become the most popular open source repository.

Qualititatively, the benefits to decentralized development have been apparent to us since at least 2007. What GitHub calls “social coding” is the ability of decentralized version control tools like Git or Mercurial to inject collaborative development into a previously individual practice. The byproduct of which is the parellelization of development; like bacteria, GitHub developers may now evolve by swapping material directly to and from one another as required coverage. Which means, in turn, that forking is no longer a bad word but the way development should be done.

In the face of GitHub’s ascendance, the implications for open source foundations are unclear. In Apache Considered Harmful, Rogers’ argues that Apache specifically has essentially outlived its usefulness.

The problem here is less about git and more about the chasm between Apache and the new culture of open source. There is a growing community of young new open source developers that Apache continues to distance itself from and as the ASF plants itself firmly in this position the growing community drifts farther away.

But while I also subscribe to Clay Shirky’s maxim that “Institutions will try to preserve the problem to which they are the solution,” it is far from clear that it applies in this case. For disclosure purposes, I’ll note here that we count both GitHub and open source foundations like Apache and Eclipse as clients.

The argument that foundations have become vestigial in a post-GitHub world necessarily focuses on functional overlaps. Historically, project hosting has been one of the services offered by foundations. The data suggests that foundations who reject decentralized version control systems will fall behind, which is why structures like Eclipse are implementing Git. Even assuming a given foundation is able to transition from centralized to decentralized mechanisms, however, there is no guarantee that this will be sufficient. Iit’s unrealistic expect any foundation to compete with GitHub on functionality or community size, given the respective areas of focus.

The value of foundations, however, has never been principally hosting. They are, rather, the manifestation of a particular mission. The Free Software Foundation, the Apache Software Foundation, the Eclipse Foundation: all serve as the focal point for a group of developers. It is possible that their respective purposes have been fulfilled by GitHub, but the surging code repository seems likely to make foundations more relevant, rather than less.

As Rogers states:

Apache was founded about 12 years ago, a time when companies were still very afraid of open source and many people in the open source community were very afraid of companies. The world hasn’t changed that tremendously, big companies still use an open source stamp as a marketing tool, commonly referred to as “open washing”, and some in the enterprise are still wary about open source, particularly when it comes to certain kinds of licensing.

But, you would be hard pressed to find a single company that didn’t use some amount of open source software nowadays.

The obvious implication is that as acceptance of mainstream open source increases the importance and relevance of Apache’s mission decreases. I believe this to be incorrect. Aside from the many important non-infrastructure services offered by foundations – including IP management, project governance, legal counsel, event planning, and predictable release schedules – foundations have value as brands.

As the volume of open source assets grows, the paradox of choice presents itself to users. This problem in part already sustains a sizable marketplace of commercial products from vendors such as Black Duck, Open Logic, Palamida, and Sonatype. With GitHub and similar tools reducing the friction associated with development, we are likely to see selection problems get worse rather than better; as the volume of open source rises, fueled in part by GitHub’s success, so too does the difficulty of making a choice.

Which is one reason foundations are important. Much as McKinsey advantages Baker and Rhodes scholars in their hiring process, certain audiences will prefer to leverage open source software associated with a known brand such as Apache or Eclipse.

We have long argued that developers are the new kingmakers. As developers begin to make more choices within the enterprises they populate, they will inevitably face the same dilemma that their management predecessors did, which is the burden of choice. GitHub is a center of gravity with respect to development, but it is by design intensely non-prescriptive and inclusive, and thus home to projects of varying degrees of quality, maturity and seriousness. Consider the following:

GitHub, Inc. (“GitHub”) supports the protection of intellectual property and asks the users of the website GitHub.com to do the same. It is the policy of GitHub to respond to all notices of alleged copyright infringement.

Notice is specifically given that GitHub is not responsible for the content on other websites that any user may find or access when using GitHub.com.

GitHub, in other words, disavows responsibility for the projects hosted on the site. Foundations, conversely, explicitly assume it, hence their typically strict IP policies. These exclusive models offer a filter to volume inclusive models such as GitHub’s.

If your continued employment depends not just on the quality of the software you employ, then, but perceptions of the quality of the software you employ, the halo effect offered by foundations that actively triage their assets is likely to be of benefit. For better or for worse. If you’re choosing between one project of indeterminate pedigree hosted at GitHub and an equivalent maintained by a foundation like Apache, the brand is likely to be a feature. Managers used to say “you won’t get fired for buying IBM.” The developers making the decisions in the future may well have their own version: “you won’t get fired for using Apache.”

Rogers appears to have legitimate concerns concerning Apache’s acceptance of Git, and I concur that it’s a GitHub world and we’re all living in it. But I find it difficult to build the case that foundations more broadly won’t have a role to play in it.

by-sa

Categories: Open Source.

Tags: , , , , , , ,

The Extracted Software Model

IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program. IMS’s challenge was to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle.” – Wikipedia, IBM Information Management System

While the practice has a long history, the formalization – and more particularly, popularization – of open source is a relatively recent development. Until the process of releasing of Netscape Navigator rallied supporters of the term open source, the industry lacked a consistent vocabulary for dealing with the concept of source code available under a license. Which gave proprietary software the stage by default.

In the absence of a widely understood model for the generation and consumption of shared code, innovation in software was largely the product of commercial vendors producing proprietary software. Starting from scratch, with few exceptions most organizations outside of finance and defense could not reasonably expect to sustainably compete with vendors whose business it was to write software. Rather than try, businesses predictably chose to outsource the process of development to vendors like IBM, Microsoft, Oracle and SAP. This is the dynamic that gave us a PwC Top 10 Global Software leaderboard with a median age of 34.5 years.

With open source systemically lowering the overhead associated with development over the last decade, however, in house development has become increasingly viable from an economic perspective. The inevitable result has been greater availability of open source code. As users and vendors alike discover the benefits to open source – via models direct or indirect – the number of projects has swelled along with the rate of contributions. The database market, for example, has gone from less than half a dozen relevant open source projects to several dozen.

None of which is news even to casual observers of the open source market. What is interesting, however, is how these projects are being developed. In some ways, development today is a return to its roots. Consider the following list of open source projects:

  • Cassandra
  • Git
  • Hadoop
  • MongoDB
  • Nginx
  • Rails

Besides the availability of their source, what do these projects have in common? None were originally authored to be sold. All were built for purpose rather than sale; this is the return of roll-your-own [coverage]. Much as IBM once extracted IMS from an engagement that sent Americans to the moon, each of the above widely used projects was the byproduct of a business problem. Cassandra was written to manage the Facebook Inbox, Git to manage the Linux kernel source tree, Hadoop to power Yahoo’s search indexing, MongoDB to back 10gen’s original Java cloud vision, Nginx to serve pages for Rambler. And as for Rails, it was extracted from Basecamp.

This “extracted” software model is becoming routine. The innovation inherent in these projects, however, is anything but. Extracted software typically exists because a perfect solution does not, which means that in many cases it is introducing new capabilities rather than recreating existing products.

With substantial history behind it, the extracted model seems to be fairly well understood, conceptually. The open question now is about the volume of latent innovation that might emerge from extracted software in the years ahead. Projects like the list above indicate that internal innovation has accelerated over the last decade, driven by trends ranging from the greater availability of open source code to an industry-wide shift towards horizontally scaled-out architectures.

But as concern about the risks of open source thaws and is offset by wider understanding of the benefits, it is probable that waves of new internally developed projects will be released as open source. The majority of which will generate little activity and interest. But from the volume, we might expect the next Git, Hadoop or Rails.

As open source trends go, then, the extracted software model is one to watch.

by-sa

Categories: Open Source.

Sonatype Insight: Data as the Product

Sonatype Insight Heatmap

There is no shortage of evidence concerning the value of data, generally. From predicting the flu to the outcome of elections (PDF) to the best practices for dating websites, it’s obvious that knowledge really is power. What’s been lacking, at least according to the conventional wisdom, has been proof points of data being a direct source of revenue.

Apart from telemetry-collecting pioneers like Spiceworks or commercial data marketplaces such as Infochimps, examples of software vendors leveraging their data byproducts have been less common.

That’s about to change. With the release of its Insight product, Sonatype – the company behind Maven – has started down the path towards leveraging its data for the benefit of customers. By monitoring activity in their central repository, Sonatype is in a position to provide fascinating metrics on traction by industry vertical; see the heatmap above. But imagine what they’ll be able to tell their customers about which library version is most popular generally, within customers of their size and within customers in their vertical. Or tell third parties about the popularity of one component versus another. And so on.

Software has value, obviously. But increasingly it will be the data that the software and its users generate that will be the differentiator and the product for vendors. Consider, for example, the level of insight 10gen will be able to provide its support and service customers once it has the ability to analyze the monitoring telemetry from thousands of running Mongo instances, via their recently launched MongoDB Monitoring Service.

We’ve been arguing for data based revenue streams since 2007. The emergence of models like Sonatype Insight in 2011 just proves out our standard maxim regarding predictions: we can always tell you what’s going to happen, we just can’t tell you when.

Disclosure: Sonatype is a RedMonk customer, while 10gen is not.

by-sa

Categories: Analytics, Data.

Tags: ,

What Would Concern Me About Android if I Worked for Google

android-growth

The growth of the Android platform undoubtedly masks some of its shortcomings. As Chris DiBona summarized, “the only thing that really matters is how many of these we ship…There is a linear relationship between the number of phones you ship and the number of developers.”

Assuming shipping volume is the metric of success, this hypothesis has been correct thus far. Regardless of what one thinks about Android the platform or its various hardware instantiations, what’s not arguable is that it is a success as measured by volume. The question for Google, Android developers and competitive platforms is whether this is sustainable, or if there are cracks in the foundation that will slow the above trajectory moving forward.

Here’s what would concern me about Android if I worked for Google:

Clustering of Handsets

For the coming holiday season, Android has several new handsets that are certain to be positioned as legitimate iPhone 4S competitors: the Samsung manufactured Galaxy Nexus, the Motorola Droid RAZR and the HTC Rezound. Besides being Android vehicles, all of these handsets have one thing in common: they’re all being released on Verizon.

Granting that Verizon is the largest carrier in the US, the merits of this strategy seem questionable if the goal is maximizing the addressable market for Android. One or more should have been simultaneously shipped on AT&T, the second largest network, or at least on second tier players like T-Mobile or Sprint.

It may yet come to pass that one or more of these handsets ship on a carrier competitive with Verizon – details of carrier coverage and availability have been scarce, itself an issue – but time is a factor during a busy holiday season.

The limitations of this handset clustering must be obvious to Google, which means that either they are unable to exert sufficient control over the manufacturers and carriers to maximize their penetration, or that the carriers have incented Google sufficiently financially to make a tactical sacrifice.

Neither possibility is encouraging, for Google or users of the Android platform.

Fragmentation

Google’s attitude towards fragmentation has been consistently dismissive; top to bottom, the organization has been willing to trade API consistency and compatibility for rate of innovation. And as discussed above, this strategy has been successful.

That said, as the platform iterates the versioning and thus fragmentation challenges multiply. Michael DeGusta visually depicted the challenges Android developers and users face with respect to fragmentation.

This problem can only be expected to get worse, as versions proliferate and the carriers who are incented to not update their users remain responsible for operating system distribution. Amazon’s effective fork of Android, which will underpin its Kindle Fire tablets, is likely to further complicate an already problematic developer story.

Google’s promised solution to this problem, meanwhile, has yet to show material benefits.

Patent Issues

Google is obviously not responsible for our fundamentally broken patent system [coverage], and it is to their credit that they have been as publicly outspoken against patents as they have. But their historical lack of focus on intellectual property accumulation has proven to be, in retrospect, a mistake. Fair or unfair, the current patent system is the current market reality. Worse, there is little evidence that we will see a solution within the projectable future, but substantial reason to believe otherwise.

While Google has employed a variety of mechanisms to address this strategic shortcoming – from discrete patent purchases to accumulation by acquisition to combat via partners – the perception is that they are losing. Building the case otherwise is challenging with the Microsoft claim that over half of Android OEMs are paying an intellectual property tax to Microsoft.

Android is far from the only player to have patent issues in mobile, to be sure, as this graphic indicates. But they may have the most to lose.

Tablet Application Volume

When the Xoom launched, the Android Market contained a mere 16 tablet specific applications. According to market.android.com, in the eight months since, they’ve added 150 new tablet applications, yielding a growth rate of 940%. Which sounds positive until you realize that the iPad more than doubled its 65,000 application catalog, which is now 140,000+ strong.

To some extent, this is unsurprising. Application generation is typically a function of adoption; as DiBona asserted above, there is a linear relationship between hardware shipments and developer interest. With Android tablet shipments anemic at present, in spite of Google’s efforts to seed the markets at events like I/O, developer interest could be expected to lag.

What is curious, however, is Google’s inability or unwillingness to get key, flagship applications ported to the platform. Nearly three quarters of a year after the official release, there is still no tablet-optimized official Twitter client. Ditto for Facebook, although they were inexplicably late to market with their iPad offering. MLB initially promised a tablet optimized version of their At Bat application similar to what was available for the iPad, but never delivered it.

So while Android has had notable wins like the NFL ’11 application for the tablet form factor, it remains tens of thousands of applications from being competitive from an applications perspective. The question is why?

Apart from the obvious answer in hardware volume, some of the explanation may lie in developer trepidation around the nature of the Honeycomb platform. With Google admitting that it took shortcuts to get Honeycomb to market in time, some developers may have chosen to wait for the unified Ice Cream Sandwich platform. But it remains curious that Google hasn’t been able to incent – financially or otherwise – key application partners to make the platform more compelling to users and potential developers.

With application availability proving to be crucial to the iPad’s ongoing success, it will be critical that Google resource itself appopriately to remedy this issue.

by-sa

Categories: Mobile.

Tags: ,