Blogs

RedMonk

Skip to content

On the Decline of the GPL

Guess which open source license is more popular than the MIT, Artistic, BSD, Apache, MPL and EPL put together? Surprise: it’s the GPL. True, usage appears to be in steep decline. Since August of 2009, the GPL is down around 8%, according to data from Black Duck. Over that same span, usage of permissive licenses is up: MIT by 8%, Apache 2% and BSD 1%. But while developers may be increasing their usage of non-copyleft licenses, is this a problem?

With all due respect to Red Hat counsel Richard Fontana, for whom the waning usage of the GPL appears to be alarming (as an aside, I find it mildly ironic that the project that built those slides is itself permissively licensed), this seems to be little more than a normal market adjustment. The unnaturally dominant role copyleft licensing played for many years was, in my view, as much an artifact of the extraordinary visibility of projects like the Linux kernel and the MySQL database as project owners’ affection for the reciprocal protections offered by copyleft licensing. As such, it never appeared to be sustainable from this vantage point, which is why we predicted back in 2009 precisely what has occurred: gains from permissively styled licenses at the expense of reciprocal alternatives.

The GPL is an enormously important mechanism, as we’ve asserted since at least 2005. It simply could not expect to be the only mechanism, indefinitely. Licenses are tools, and should be selected and employed based on a desired outcome. As those desired outcomes have changed over time, it’s only logical that licensing patterns change to accomodate.

We have argued, both observationally and based on public market valuations, that the value of software as a differentiated asset is in decline. The evidence suggests that native web businesses assign a substantially lower value to written software than did their predecessors. Facebook, for example, originally wrote Cassandra to manage their Messages feature, subsequently releasing the code into an Apache project. When they rebuilt Messages, they chose Hbase – an Apache project originally created by a separate organization, Powerset – over their own Cassandra. GitHub’s Tom Preston-Werner, for his part, recommends open sourcing all but those features that represent “core business value.”

What both organizations have realized is that very little code, in practice, is competitively differentiating. Which makes open source a logical course of action, because the potential benefits of making the source code available are likely to substantially outweigh the costs. And as far as licensing is concerned, if the code is not a competitive advantage, it is likely not worth protecting. For those who view the code they produce as a generally fungible asset, the additional protections afforded by a reciprocal license may not only be unnecessary, but unwanted. In this scenario, permissive licenses are a perfect alternative.

Which should be ok. Open source licenses are, ultimately, different tools. Employing them towards different ends is nothing more than logic.

by-sa

Categories: licensing, Open Source.

AWS, Y Combinator and the Startup Boom

In San Francisco cafes and bars, even on the street, I overhear people talking about their startup ideas, business plans, and goals. And there are tons of incubators, Angels, wannabe Angels, VC firms, making investments in startups.
- “Silicon Valley’s dirty little secret: The ‘Startup Boom’ is a disguised jobs fair for big corporations,” Tom Foremski: February, 2012

The conventional wisdom in this industry is that we are amongst a startup boom. Besides the piece above, there’s Bloomberg: “New York Is Having ‘Incredible’ Startup Boom, Hippeau Says“; Forbes: “How Cloud Computing is Fueling the Next Startup Boom“; GigaOm: “Why the big data startup boom will likely be short-lived“; the Wall Street Journal: “Veteran Investor Defends Start-Up Boom,” and on.

One of the things missing from these articles, however, is an attempt to examine the question quantitatively. Is there, in fact, a boom, and if so, what does it look like? Ever since running across the Kauffman Foundation’s report “Starting Smaller; Staying Smaller: America’s Slow Leak in Job Creation” via Andrew McAfee and Erik Brynjolfsson’s “Race Against the Machine,” I’ve been curious about how the technology industry’s job creation compared to the rate overall. Working backwards from the same source of data that the Kauffman Foundation used – the US Census – it was simple enough to collect basic job creation data for the US. What was missing was insight into technology company starts. Fortunately, AppFog’s Chris Tacy had sourced relevant data from the National Venture Capital Association and the UNH Center for Venture Research. Plotting overall industry business creation as measured by the census against the volume of seed and venture deals, normalizing the data in the process, this is what we find (Census data is only available for 2009 and earlier).

The trendlines are more gradual than “startup boom” rhetoric might lead one to believe, but they do support the assertion that business creation in the technology sector exceeds that of the wider job market. There are two notable spikes in venture activity; the first, from 2002-2004, roughly coincides with the dot com boom. The second, beginning in 2006, is likely attributable to something else entirely.

Over on the AppFog blog, Tacy notes an odd phenomenon: while the normalized volume and deal size metrics initially track closely, beginning in early 2006 they are effectively decoupled as the volume of deals spikes while the average deal size falls dramatically. A replication of his dataset is below. The question is what caused this phenomenon.

Tacy attributes this to the rise of Amazon Web Services and the subsequent rise of the cloud market. And it’s difficult to argue that S3, launched March 2006, and EC2, launched in August of the same year – not to mention the market they helped to create – haven’t had a profound impact on the costs associated with business formation, and thus, the attendant risk. As Flip Kromer put it: “EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours.” The cost model is inarguably more disruptive than the underlying technologies that power the cloud.

It is also likely, however, that the rise of seed-stage startup funding companies like Y Combinator (founded March 2005) were a catalyst in the above equation. It’s difficult to envision the comparatively small funding levels having had the same impact in the absence of cloud platforms that minimize upfront capital expenses. But with 316 companies funded from 2005 through 2011, Y Combinator would account for approximately 13% of the total venture deals in that span according to the data above. Factor in the funding firms such as Tech Stars that have followed, and the impact of Y Combinator is undeniable.

However the trends are accounted for, the takeaways are clear. Business creation in the technology sector is outperforming the wider market, while the capital required for formation – and thus the risk attached – is in clear decline. Both of which bode well for an economy desperate for an increase in employment opportunities.

by-sa

Categories: Cloud, Startups.

The RedMonk Programming Language Rankings: February 2012

For years now, it has been self-evident to us at RedMonk that programming language usage and adoption has been fragmenting at an accelerating rate [coverage]. As traditional barriers to technology procurement have eroded [coverage], developers have been empowered to leverage the runtimes they chose rather than those that were chosen for them. This has led to a sea change in the programming language landscape, with traditional language choices increasingly competing for attention with newer, more dynamic competitors.

The natural consequence of this tectonic shift has been uncertainty. Vendors for whom supporting Java and Microsoft based stacks was once sufficient are being forced to evaluate the array of alternatives in an effort to maximize their addressable audience. Platform-as-a-Service (PaaS) stacks like Cloud Foundry and OpenShift are perhaps the best example of this; the differentiation for each at launch was in part their support for multiple independent runtimes from JavaScript to Ruby.

While the question is obvious – which languages should I support? – the answer, and mechanisms for determining an answer, have been considerably less so. There is no canonical metric for determining platform traction; we employ half a dozen or more internally at RedMonk, depending on context, which incorporate everything from GitHub LOC rankings to LinkedIn group membership data.

But one of our favorites is the one originally developed by Drew Conway in 2010. It compares and contrasts the rankings of programming languages on GitHub and Stack Overflow to provide a broader view of language popularity. Our first snapshot using this model came in September. Five months later, we recompiled the data and plotted it to see what – if anything – had changed. Herewith the updated plot.

dataists-020711

In general, the addition of languages like Dylan, Turing or Rust aside, little has changed six months on. We still have two clearly defined upper language tiers, with two to three less visible below that. There are, however, several developments worth discussing in more detail.

  • CoffeeScript: billed as a more syntactically approachable alternative that compiles to JavaScript, CoffeeScript made subtantial performance gains relative to its Stack Overflow tag volume (63%), but also jumped significantly in terms of its popularity on GitHub. Since September 1st, CoffeeScript was not only one of 11 languages to increase in popularity, it jumped the furthest, going from #19 to #13. The jump is even more significant since six new languages were added to GitHub’s list in that span. With all due apologies to Bryan Cantrill, the numbers indicate that CoffeeScript is one of the fastest growing platforms by this metric.
  • Java: as recently as a year ago, Java was widely regarded as a language with a limited future. Between the increased competition from dynamic languages and JVM based Java alternatives, while the JVM had a clearly projectable future, even conservative, enterprise buyer oriented analysts – the constituency most predispoed to defend Java – were writing its obituary. As we argued at FOSDEM last February, however, these conclusions were premature according to our data. One year in, and the data continues to validate that assertion.

    Apart from being the second highest growth language on GitHub next to CoffeeScript, Java – already the language with the second most associated tags on Stack Overflow – outpaced the the median tag volume growth rate of 23%. This growth is supported elsewhere; on LinkedIn, the Java user group grew members faster than every other tracked programming language excepting C# and Java. This chart, for example, depicts the percentage of LinkedIn user group growth for Java and JVM based alternatives since November of 2011.

    linkedin-percentage-growth

    This outperformance is even more impressive when the overall member numbers are factored in.

    linkedin-member-count

    Our data, then, indicates that Java remains – in spite of the fragmented programming language landscape – a viable, growing language.

  • Rust: a C/C++ like syntactical language originally developer in 2006, with the 0.1 of its compiler completed only last month, Rust has surprising traction on GitHub. On February 1st, it sat at 21 on GitHub Explore, ahead of Clojure, Groovy, Erglang, R, Go and a half dozen other relatively popular languages. While this is almost certainly a product of Mozilla’s involvement in the project, it has caught the eye of more than a few prominent technologists. There are a mere 4 questions tagged with Rust on Stack Overflow, so it’s clearly early days, but Rust is on the radar.

Other quick hit observations:

  • C dropped 2 spots on GitHub’s rankings, from 5 to 7
  • Go posted the fourth highest growth percentage on Stack Overflow, R was sixth
  • Java passed PHP
  • Prolog jumped six spots on GitHub from 30 to 24
  • Scala may be separating itself from the other Tier 2 languages
  • Viml is popular on GitHub
by-sa

Categories: Programming Languages.

Tags: , , , , , , ,

Amazon DynamoDB: First Look

This paper described Dynamo, a highly available and scalable data store, used for storing state of a number of core services of Amazon.com’s e-commerce platform. Dynamo has provided the desired levels of availability and performance and has been successful in handling server failures, data center failures and network partitions. Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request load.

Dynamo allows service owners to customize their storage system to meet their desired performance, durability and consistency SLAs by allowing them to tune the parameters N, R, and W.
- “Dynamo: Amazon’s Highly Available Key-value Store [PDF],” Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels


In October 2007, Amazon published a paper describing an internal data store called Dynamo. Incorporating ideas from both the database and key-value store worlds, the paper served as the inspiration for a number of open source projects, Cassandra and Riak being perhaps the most visible of the implementations. Until yesterday, these and other derivative projects were the only available Dynamo implementations available to the public, because Amazon did not expose the internally developed database as an external service. With Wednesday’s launch of Amazon DynamoDB, however, that is no longer true. Customers now are able to add Amazon to their potential list of NoSQL suppliers, although to be fair they’ve technically been in market with SimpleDB previously.

The following are some points of consideration regarding the release, its impact on the market and likely customer questions.

AWS versus Hosted

The most obvious advantage of DynamoDB versus its current market competition is the fact that it’s already in the cloud, managed and offering consolidated billing for AWS customers. Requiring minimal setup and configuration versus native tooling, a subset of the addressable market is likely to be of a similar mindset to this commenter on the DataStax blog:

“Cassandra’s tech is superior, as far as I can tell. But we’ll probably be using DynamoDB until there is an equivalent managed host service for Cassandra. Moving to Cassandra is simply too expensive right now.

All those are clearly better served by a service like DynamoDB than trying to run their own Cassandra clusters unless they happen to be very proficient in Cassandra administration and want to dedicate precious human resources to administration. That takes a lot of the benefits of “cloud” away from small and mid-sized companies where cost and management are the limiting factors.”

For many, outsourcing the installation, configuration and ongoing management of a data infrastructure is a major attraction, one that easily offsets a reduced featureset. Like Platform-as-a-Service (PaaS) offerings, DynamoDB offers time to market and theoretical cost advantages when required capital expense and resource loading are factored in.

Like the initial wave of PaaS platforms, however, DynamoDB is available only through a single provider. Unlike Amazon’s RDS, which is essentially compatible with MySQL, DynamoDB users will be unable to migrate off of the service seamlessly. The featureset can be replicated using externally available code – via those projects that were originally inspired by DynamoDB, for example – but you cannot at this time download, install and run DynamoDB locally.

It’s true that the practical implications of this lack of availability are uncertain. NetFlix’ Adrian Cockroft, for example, asserts that migration between NoSQL stores is less problematic than between equivalent relational alternatives, because of the lower complexity of the storage, saying “it doesn’t take a year to move between NoSQL, takes a week or so.” It remains true, however, that there are customers that postpone upgrades to newer versions of the same database because of the complexity involved. And that’s without considering the skills question. Given the uncertainty involved, then, it seems fair to conclude that the proprietary nature of DynamoDB and the potential switching costs will be – at least in some contexts – a barrier to entry.

The question for users is then similar to that facing would be adopters of first generation PaaS solutions: is the featureset sufficient to compel the jeopardizing of later substitutability? Amazon clearly believes that it is, its competitors less so. EMC’s Mark Chmarny, additionally, notes that Amazon may be advantaging adoption at the expense of migration in its pricing model.

Competition

DynamoDB clearly has the attention of competitive projects. Basho – the primary authors of Riak – welcomed DynamoDB in this post while pointing out the primary limitation, and DataStax wasted little time spinning up a favorable comparison table. One interesting aside: the Hacker News discussion of the launch mentioned Riak 23 times to Cassandra’s three.

Basho and Datastax are right to be concerned, because the combination of Amazon’s increasingly powerful branding and the managed nature of the product make it formidable competition indeed. The question facing both Amazon and competitors is to what extent substitutability matters within the database space. Proprietary databases have had a role in throttling the adoption of PaaS services like Force.com and Google App Engine in the past, but we have very few market examples of standalone, proprietary Database-as-a-Service (DaaS) offerings from which to forecast. Will DaaS or more properly NoSQL-as-a-Service be amenable to single vendor products or will they advantage, as they have in the PaaS space, standardized platforms that permit vendor choice?

The answer to that is unclear at present, but in the meantime expect Amazon to highlight the ease of adoption and vendors like Basho and DataStax to emphasize the potential difficulties in exiting, while aggressively exploring deeper cloud partnerships.

NoSQL Significance

It’s being argued in some quarters that DynamoDB is the final, necessary validation of the NoSQL market. I do not subscribe to this viewpoint. By our metrics, the relevance of distinctly non-relational datastores has been apparent for some years now. Hadoop’s recent commercial surge alone should have been sufficient to convince even the most skeptical relational orthodoxies that traditional databases will be complemented or in limited circumstances replaced by non-relational alternatives in a growing number of enterprises.

Throughput Reservation

Perhaps the most compelling new feature of Amazon’s new offering isn’t, technically speaking, a feature. Functionally, the product is (yet) another implementation of the ideas in the Dynamo paper; Alex Popescu has comprehensive notes on the feature list. Receiving the most attention aren’t technical capabilities like range queries but rather the concept of provisioned throughput, levels which can be dynamically adjusted up or down.

This type of atomic service level provisioning is both differentiating and compelling for certain customer types. Promising single digit latency at a selected throughput level with zero customer effort required is likely to be attractive for customers that require – or think they require – a particular service level. And by requiring customers to manually determine their required provisioning level, Amazon stands to benefit from customer overprovisioning; customers will feel pain if they’re under-provisioned and react, but conversely may fail to observe that they’re over. Much like mobile carriers, Amazon wins in both scenarios.

Timing

With DynamoDB having been extant in some form since at least 2007, one logical question is: why now? Amazon did not detail their intent with respect to timing when they prebriefed us last week, but their track record demonstrates a willingness to be first to market balanced with an understanding of timing.

In 2006, Amazon launched EC2 and S3, effectively creating the cloud market. This entrance, however, was built in part from the success of the Software-as-a-Service (SaaS) market that preceded it; Salesforce, remember, went public in 2004. With enterprises now acclimated to renting software via the network, the market could be considered primed for similar consumption models oriented around hardware and storage.

Three years later after the debut of EC2 and S3, and one year after MySQL had achieved ubiquity sufficient to realize a billion dollar valuation from Sun [coverage], Amazon launched the first cloud based MySQL-as-a-Service offering [coverage]. That same year, the first year that Hadoop was mainstream enough to justify its own HadoopWorld conference, Amazon launched Elastic MapReduce.

The pattern is clear: Amazon is unafraid to create a market, but attempts to temper the introductions with market readiness. Logic suggests that the same tactic is at work here.

NoSQL has, as a category, crossed the chasm from interesting science project to alternative data persistence mechanism. But while NoSQL tools like Cassandra and Riak are available in managed form via providers like Joyent and Heroku, DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.”

It is also possible that SSD pricing contributed directly to the launch timing, with pricing for the drive type down to levels where the economics of a low cost shared service finally make sense.

SSDs

One underdiscussed aspect to the Dynamo launch is the underlying physical infrastructure, which consists solely of SSDs. This is likely one of the major contributing factors to the performance of the system, and in some cases will be another incentive to use Amazon’s platform as many traditional datacenters will not have equivalent SSD hardware available to them.

The Net

While discussion of the DynamoDB offering will necessarily focus on functional differentiation between it and competitive projects, it is likely that initial adoption and uptake will be primarily a function of attitudes regarding lock-in. For customers that want to run the same NoSQL store on premise and in the cloud, DynamoDB will be a poor fit. Those who are optimizing for convenience and cost predictability, however, may well prefer Amazon’s offering.

Amazon would clearly prefer the latter outcome, but both are likely acceptable. Amazon’s history is built on releasing products early and often, adjusting both offerings and pricing based on adoption and usage.

In any event, this is a notable launch and one that will continue to drive competition on and off the cloud in the months ahead.

Disclosure: Basho is a RedMonk client, while Amazon and DataStax are not.

by-sa

Categories: AltDB, Cloud.

Tags: , , , , , ,