I spent the latter part of last week in San Diego at the inaugural CloudOpen conference, along with LinuxCon, which are both co-located with the Linux Plumbers Conference and the Linux Kernel Summit. As a pleasant change from the norm, I actually had an opportunity to attend a few talks. And one great thing about this event in particular is that the talks were full of new material, not just people making the rounds from one conference to another.
Logistically, the event was quite well-run. The Linux Foundation folks put on a ton of events, and they’ve got much of it down to a science. Even when the keynotes ran late, they had a PA system hooked up and announced schedule changes throughout the conference. One particularly interesting aspect of the conference was that they brought in food trucks for lunch rather than having the event purely catered by the hotel. Considering the conference hotel (the Sheraton Hotel & Marina) is adjacent to the airport and fairly isolated from the rest of the city, it was a pleasant surprise to get a bit of the flavor of San Diego coming to us. Here’s brief summaries and thoughts on the talks I found particularly interesting.
The anatomy of a Tweet
Twitter’s open-source manager, Chris Aniszczyk, gave one of the keynote talks on what happens on the back-end infrastructure every time you tweet. He started out with their philosophy, which is “Open-source almost all the things.” When they think about what’s sacred at Twitter, it’s not the software — it’s the data, particularly the historical data. Once they made that realization, they started opening tons of code. On GitHub, Twitter’s Bootstrap repository alone has more than 37,000 people who have bookmarked it and more than 8,000 forks.
After that, Chris walked through the flow of what happens when you send a tweet in. Turns out it goes through a ton of open-source code along the way, as well as a few proprietary bits:
- Snowflake (unique IDs)
- Talon to get a short URL
- t-bird and Gizzard for scalable MySQL storage
- Search w/ early-bird, including lots of indexing with Lucene
- Fanout to all followers of that person, done with FlockDB
They’re also heavy users of analytics including a Mesos cluster, tens of thousands of machines mostly running Linux 2.6.39, and a Hadoop cluster running 10K+ jobs per day. They digest more than 100 TB of data daily.
Like Facebook and DataSift, Twitter is another great example of a web-scale company that had to write its own solutions when none existed. Fortunately for us, they’ve all chosen to open-source the code, and in Facebook’s case the hardware itself, so everyone can benefit from their efforts.
Randy Bias on clouds, elasticity and open source
Randy’s the CTO and co-founder over at CloudScaling. He started out talking about the importance of elastic infrastructure, scaling out to tons of white boxes rather than scaling up to huge, beefy servers. The point he made was that scale-out is linear or better in hardware cost, while buying a server that’s 4x more powerful could be 8x the cost.
One major problem with enterprises, he said, was that they tend to build FrankenClouds: part enterprise virtualization, part elastic infrastructure, and not enough of either.
He sees open-source developers as Prometheus bringing fire down from the mountain, disrupting across the IT stack, with examples of MapReduce vs Hadoop, BigTable vs Cassandra/HBase, Google App Engine vs Cloud Foundry, and even hardware with Open Compute and the Open Data Center Alliance vs typical designs.
Randy sees open clouds as meeting these requirements:
- Reduce lock-in
- Provide control & flexibility
- Enable building at scale
In my view, to provide true control, the cloud implementation must actually be open source rather than just the APIs be open, as some have argued. In that link, Joe Brockmeier notes a definition from Red Hat’s Scott Crenshaw that generally meets the above criteria and adds a final one that there must be a viable, independent community around it. In other words, it must behave as open source rather than meeting the licensing criteria alone.
Collaborative configuration management
Short summary: The CFEngine folks have created a way to share libraries of common functions called “sketches.” Good, and necessary, for a successful community around this configuration-management framework. The trick will be getting the barrier to entry right by packaging it well.
Another interesting point was that they’ve decided the line between the open-source and enterprise versions will not be random features, but instead will be the difference between managing single machines and easily orchestrating multiple machines together.
One “feature” that may cause contention among potential users, which I hadn’t recalled about CFEngine, is its convergence engine. There’s no strict ordering of operations, so things aren’t deterministic in the same way as with some other options; rather, if something breaks, it will skip that step and continue on, then keep re-trying steps that failed until they work or it has nothing else to do. While potentially frustrating when it hits issues, this is sometimes necessary when you’re managing physical, heterogeneous environments.
Fontana on licensing
Richard Fontana is Red Hat’s chief open-source licensing and patent lawyer, and he gave an interesting and highly personal 45-minute rant on what he saw as issues with OSS licensing today. Some of the issues mentioned:
- Authoritarian control of FLOSS definitions raises questions of legitimacy, bias, and transparency. There should be greater awareness and debate of things like definitions from the Free Software Foundation, the Open Source Initiative, and Debian (which provides its own set of free-software guidelines). The OSI should replace its definition with one based on the FSF’s rather than Debian’s.
- Institutions providing definitions should also provide the rationale for those definitions.
- Consider adding “open development” criteria to the definition. Throwing code over the wall a la Android could be defined out of “open source.”
- Multiple projects should work together to develop and police the criteria. Linux distributions have done this in the past, and they also have a unique power in that they can pressure upstream developers to change the license in some cases.
I generally agree with his points, and we’re certainly starting to see shifts toward more democratic control of some definitions with the OSI’s move toward a more member-driven organization.
Software patents and the Open Invention Network
Keith Bergelt of the OIN talked about issues created by software patents, some of which the OIN is working to limit, and some of which are rather outside of its scope. For example:
- Open-source software enables innovation in a way that’s threatening to companies seeking to restrict methods and products of innovation. These comprise direct open-source antagonists.
- Market dynamics have favored increases in patent trolling that limits innovation, in part by forcing more money to legal defense. These comprise indirect open-source antagonists that merely involve OSS because it’s another community of potential infringers.
- The target of these groups is making the total cost of ownership prohibitive. They target the weak links in the chains. For example, in the case of Android phones, they go after contract manufacturers like Foxconn. These companies simply pass along the costs until it’s not fiscally feasible to use OSS-based devices.
It was interesting to hear about the tactics that go into enforcing them. My opinion is that software patents tend to limit innovation because they’re overly broad with too long of a term, and they aren’t approved by experts in the field, rather than being evil in themselves. I further believe that being able to apply two forms of protection to a single piece of work (patent and copyright) is wrong.
All in all, CloudOpen / LinuxCon went quite well, and I was pleased to make it to some talks containing things that were new to me. As you can imagine, once you start going to a certain number of conferences, many of the talks tend to not be novel anymore — just the same people, or the same company at least, going “on tour” with its latest work. I particularly enjoyed the opportunity to hop between Linux and cloud sessions as well as notice the significant overlap between them.
Disclosure: The Linux Foundation, GitHub, Red Hat, and VMware are clients. Cloudscaling has been a client. Twitter, Facebook, DataSift, Google, CFEngine, and Foxconn are not clients.