Blogs

RedMonk

Skip to content

Linus Torvalds: Linux and git are innovations in process, not software

From a TechCrunch interview with Linus Torvalds, creator of the Linux kernel:

In fact, to get a bit “meta” on this issue, what’s even more interesting than improving a piece of software, is to improve the *way* we write and improve software. Changing the process of making software has sometimes been some of the most painful parts of software development (because we so easily get used to certain models), but that has also often been the most rewarding parts. It is, after all, why “git” came to be, for example. And I think open source in general is obviously just another “process model” change that I think is very successful.

It’s impossible for me to overstate the importance of adopting development models from open-source software and from distributed version-control systems like git. After spending nearly 10 years working on Gentoo Linux, I’m deeply familiar with how huge of an advantage these processes give you. You don’t even need to use open source to learn from what the OSS world is doing — just leverage the same techniques within your company.

Disclosure: Neither the Linux Foundation nor the Gentoo Foundation is a client.

by-nc-sa

Categories: open-source.

Adoption of software is a funnel

At our Monki Gras conference this February, the creator of Jenkins, Kohsuke Kawaguchi, gave an outstanding talk on building a community around open-source software.

(Aside: All the videos from Monki Gras are now online at RedMonkTV.)

One of the key concepts he talked about was lowering the barrier to entry for new users, with the great metaphor of a funnel. If you’ve ever heard of a sales funnel, it’s the same idea applied to adoption of software — a multi-step process, where failure at any of the steps kicks people out of the funnel, but success means they proceed to the next step.

In my view, the key point of this funnel metaphor is that the many small barriers to adoption are not independent minor factors. Instead, they’re additive and strongly interdependent, gradually ripping more and more of your potential users out of the adoption funnel.

Let’s take Red Hat’s RHEL as an example, and consider how this affects RHEL’s ability to get developers building software on its platform. Imagine that we start with 100 developers who are thinking about basing development on RHEL. I drew a picture to illustrate how many of these 100 might end up actually developing on RHEL, based on a funnel (with each stick figure representing 10 developers):

Every time users hit a problem, some percentage of them will give up and move on to an alternative (colored red, above). In the end, you might end up with only 20 out of 100 potential users actually running your software!

So when you’re thinking about the developer experience and how to get more developers running your software, think about the funnel. Here’s the 15-minute video of Kohsuke’s talk:

Disclosure: Red Hat and CloudBees (which employs Kohsuke) are both clients.

by-nc-sa

Categories: adoption, operating-systems.

Whither the GPL? Why we don’t need it anymore

During a discussion yesterday on Twitter about the implications of license choices for clouds, I said, “I think GPL played a larger mission when people weren’t educated about open source.” This got so much interest that I wanted to expand on it here.

 The GPL enforced good behavior, accompanied by bureaucracy

In the early days of the GPL and copyleft software, it played an important role in forcibly training companies how free/open-source development worked. First, they had no (legal) choice but to comply with the terms of the license. This meant that, like it or not, they had to use at least a somewhat open development model. Even throwing code over the wall meant the code was there for others to pick up and use. If enough people did so, it could result in a fork — unless your company was out there in the community, too, rather than hiding behind a wall.

Over time, more and more tech companies were forced to learn about the merits of transparent, community-based software development as they used, incorporated, and complied with the requirements of GPL software. At least in the tech world, the benefits are fairly well understood at this point. Even companies who aren’t doing anything with open source still want to bring in the techniques pioneered by its distributed development models, which I spent the past 10 years using in Gentoo Linux.

But it’s the compliance where things get frustrating, as my colleague Stephen mentioned yesterday. Dealing with GPL compliance is a major effort for companies, one they’d rather not worry about from both time and legal perspectives.

Not to mention that copyleft licenses make it much harder to build proprietary products. An exceedingly popular business model is opening the infrastructure while building a closed layer on top as a product, often some sort of administration or management interface in the case of cloud.

Enter permissive licenses.

Apache rose as GPL-mandated “education” was no longer required

I won’t bother rehashing the broader evidence on the decline of the GPL and rise of the Apache license because Stephen’s already talked about it. What’s interesting is who is choosing the Apache license — projects like Hadoop, the Cassandra NoSQL database, and most recently Citrix’s CloudStack. That’s why I said a few days ago that the Apache Software Foundation is beginning to become the center of the open cloud ecosystem. The uptake of the Apache license and ASF governance, particularly among cloud providers and infrastructure software, has been notable over the past few years.

Many years ago, those who successfully used the Apache license tended to be highly technical people immersed in the open-source world. For example, look at the Apache web server. It used the Apache license way back when, but it worked out great because OSS-knowledgeable sysadmins wrote it — people who understood the benefits of truly open-source development.

But today, open-source software is much more familiar, and tech companies generally understand how the model works, if not always when to open their own software. So the initial mission served by the GPL of force-feeding open-source development isn’t as necessary as it once was, at least among educated tech companies.

I suspect the lifecycle has now shifted. Although tech companies get it, the next step will be non-tech companies such as retailers, investment banks, etc. Maybe the GPL is the right choice for them as a teaching tool, followed in a few years by a migration toward permissively licensed software once the benefits are understood.

Disclaimer: None of the companies mentioned are current RedMonk clients.

by-nc-sa

Categories: cloud, licensing, nosql.

NoSQL database popularity, based on Jaspersoft data

As my colleague Stephen said last week, everyone has interesting data. When I was at O’Reilly’s Strata conference earlier this month, I was fortunate enough to meet Karl Van den Bergh of Jaspersoft, a business intelligence company that says it makes the world’s most popular BI software. But it turns out that Jaspersoft has some very interesting data that’s relevant far beyond its own arena — because its software can connect to any number of data sources, including a number of NoSQL databases. They’ve taken one approach to it by creating a “Big Data Index” that has some time-series data. Karl was kind enough to share this data with us, so we’ll be taking a few different approaches to looking at it over the next few weeks. At this first pass, we’ll just look at a summary of the total downloads between January 2011 and March 2012:

What’s striking about this is the broad consistency it shows with data from other sources as well as my intuitive expectation that Hadoop, Mongo, and Cassandra would show up at the top of the list. Some interesting points are the relative popularities:

  • Hadoop and Mongo are quite similar;
  • Cassandra was surprisingly low, to me, although it’s quite competitive with CouchDB;
  • Redis shows a fairly strong placement, supporting its status as an up-and-comer;

I split the Hadoop downloads into a stacked histogram showing Hive, HBase and Avro separately. Over this time span, the more SQL-like Hive beat out HBase with 50% more downloads: 3,682 to 2,360. This could be a reflection of the growing popularity of Big Data applications for people new to the Hadoop ecosystem, who are looking for a familiar toolset to lower the barrier to entry. Avro, which you may not have heard of, is a serialization format for Hadoop that’s designed for data-intensive applications, so it’s no surprise that a niche use case shows less popularity than the more broadly applicable HBase and Hive methods for accessing Hadoop.

The thing that’s particularly powerful about this data is that everyone has something like it. Whether it’s download statistics or web traffic, it can all provide useful insights — especially when combined with other data like we can do with RedMonk Analytics.

Disclosure: Jaspersoft is not a client. Hadoop distributor Cloudera as well as MongoDB-based 10gen are both clients, but Cassandra support company DataStax is not.

by-nc-sa

Categories: big-data, data-science, nosql.

IBM Pulse 2012: Tivoli gets the bleeding edge of tech

I went to IBM’s Pulse conference in Las Vegas last week, which centers around its Tivoli organization. Tivoli is focused on the systems/operations side of things: think cloud and DevOps all the way up to high-level business- and government-directed initiatives like Smarter Cities and Smarter Planet. Check out James’s excellent post on the history behind IBM and Tivoli for some broader context.

My biggest take-home was this: IBM’s people really get it. They understand trends that are happening at the frontlines of tech today in startups and in open-source development. IBM is way out in front on enabling DevOps in big enterprises, and the teams working on DevOps inside Tivoli as well as Rational (which builds tools for developers) are outstanding. A lot of my experience with enterprises is that they’re slow-moving and often lagging trends by years, to the point where it’s nearly laughable, but in this case IBM is definitely a front-runner.

How many enterprise software conferences have you seen where DevOps isn’t just mentioned in passing but has a whole track of its own? Not just that, but many of the talks were immensely popular — I’m told that was a major change since last year, and presumably this was a combination of the broader perception of DevOps’s importance as well as some internal shifts in who’s building what.

On the cloud front, they’re starting to consolidate everything under a Smart Cloud umbrella. This is a great move from my point of view as a newcomer, as I’ve found it immensely confusing to get a grip on what they’re doing. Tivoli still has farther to go on this front, but it’s definitely moving in the right direction. The mere existence of the 10s (100s?) of IBM products needed to set up a “Smart Cloud” is problematic; if anything, the most complex it should be is a hierarchy of different levels, not filling a shopping basket with a myriad of products.

I was also quite impressed by some work out of IBM Research on managing mobile devices in the enterprise. My colleague Tom’s touched on this in the context of extending battery life. But it just so happens that I independently found it interesting enough to spend a good hour talking to John Ponzo and a couple of his coworkers. The idea of app-level security rather than device-level seems like a great fit with overarching IT trends like Bring Your Own Device, so your company only needs to lock down and inspect the apps it cares about, not all your personal ones too.

Other than that, the most interesting mobile news came out of the Worklight acquisition. Worklight is an enterprise suite for building mobile apps, based on the now-famous PhoneGap. It adds things like Eclipse plugins, a server for push notifications, analytics, etc. IBM just announced its acquisition at the end of January, and everyone at Pulse couldn’t wait to start building their mobile apps with it. That’s just another sign of Tivoli being on the leading edge.

I’m hoping to dive into more depth later on some of these topics, but those are my initial thoughts coming out of my first Pulse.

Disclosure: IBM is a client and covered my hotel stay at Pulse.

by-sa

Categories: cloud, devops, ibm, mobile.

Wolfram|Alpha Pro: data science made easy

I think the announcement of Wolfram|Alpha Pro will prove to be among the most important advancements in data science in 2012. It’s about bringing data science to the masses, so anyone can do basic analysis without installing and learning complex tools (or any tools). For those not yet in the know, Wolfram|Alpha is essentially a very smart calculator that takes your natural-language query, then goes and digs out all the information it has and presents it visually, using tables and graphs. It can do everything from retirement calculations to population graphics to Apple’s share price over time.

The most interesting thing the Pro version does is to automate the initial steps in exploring your data. It will show you what your data look like and what’s interesting about them, precisely the first steps any data scientist would take upon receiving them. And it’s orders of magnitude cheaper than hiring your own data scientist, at $5/month. The limitations are equally apparent: knowing the right questions to ask, and how to interpret the results, are left to you.

Another incredibly useful feature is the ability to download the numbers behind every graph W|A Pro creates. Raw access to the data enables you to do follow-up analysis, to combine these numbers with others, or to simply make a prettier graph. This turns W|A Pro into a sort of centralized data source — do your search there, and pull the data out to use however you wanted, rather than digging all over the Internet for the data you wanted.

If you’re just getting started with data analysis, I couldn’t recommend a better place to begin. W|A Pro is about democratizing data science. But if the results are unclear, or you’re unsure what conclusions to draw — well, maybe it’s time to ask us. Don’t stop at pretty graphs when you need actionable insights, because they’re all hidden in the data you already have.

by-sa

Categories: data-science.

Why you need to come to Monki Gras (OR, a Monktoberfest redux)

When I went to Monktoberfest last October, I wasn’t sure exactly what to expect. Well, besides great beer. The main reason I went was because, quite simply, I thought RedMonk was awesome, and that would rub off on the conference. The fact that I was applying to work with them was a minor detail — just another consequence of my high opinion of their work.

I thought, “Could a one-day conference possibly be worth it?” Most of my experiences were with multi-day events like OSCON. At the time, I was on the borderline, mostly because talks I’ve attended have been a real mix in quality, from mind-altering to mind-numbing. But the speakers looked awesome (as they do for Monki Gras), so I decided to go for it. After getting in touch with Steve, I even got invited to give a talk on one of my favorite topics, how to deal with abusive jerks in online communities.

My favorite conferences are the small ones. They’ve got a real community and you get to know everybody instead of stumbling through a crowd of 10,000 strangers. Bigger confs are good for re-connecting with everybody you already know, but small ones are perfect for meeting new people. Somewhere between 50 and 150 people is just about right (Dunbar’s number, anyone? — check when Wikipedia isn’t doing the SOPA blackout). So Monktoberfest seemed like a good fit from a distance, but it remained to see how things worked in actuality.

Finally the day came and I arrived in Portland, Maine. Monktoberfest started with a beer reception the night before at an awesome bar called Novare Res, where I spent the night drinking top-notch beers. You just can’t find this kind of stuff on tap anywhere else, unless you’re thinking of the Lion’s Pride (where we spent the next night). Thursday morning started at a reasonable hour, for a pleasant change, and we had a full day of talks. The talks ranged from “best ever” to merely “outstanding,” so I didn’t even need coffee to keep me awake — just to fend off the caffeine headaches! Lunch featured more excellent beers with a few local selections, as well as lobster rolls since it was Maine, and more talks of an unbelievable caliber followed. At the end of the day, my biggest problem was that my brain was overstimulated.

Thank goodness we had plenty of amazing beer to come. We headed to the Lion’s Pride and had a scrumptious meal along with beer that was so good it was literally unreal. And that’s not even getting into the company; I didn’t talk to one boring or irrelevant person the whole time. Normally, I have to make the rounds just to keep myself entertained. In this case, I had to force myself to move on because I was having so much fun. In fact, the night was so good that one of us bribed the bus driver to stick around longer so we didn’t have to leave! That’s Monktoberfest in a nutshell.

And that’s exactly why you need to come to Monki Gras. Different location (London), different community, but the same RedMonk flavor with a twist of James.

by-sa

Categories: redmonk-brew.

Is the Windows desktop losing market share to mobile?

One of the big questions right now is whether mobile devices will replace or supplement existing desktop/laptop usage.

When I was browsing RedMonk Analytics recently, I found some tempting evidence that mobile devices truly are stealing desktop share. But interestingly, this seems to primarily be the case for Windows desktop users, much more so than Mac or Linux. Take a look at this graph of operating systems (OSs) of visitors to redmonk.com, a heavily developer-oriented audience:

OS usage

The first thing that jumps out at you is that Windows usage drops by 5.2% since mid-2010, down from 61.8% to 56.6%. But where’s that loss going? Those people must be using something else instead, and the question is what. In the same time period, there’s more subtle growth in all of the mobile OSs we measure (iPhone, iPad, Android), for a combined increase of 5.8%. Mobile users now account for nearly 10% of our visitors, tripling from just a year and a half ago (so James, time to set up a mobile theme on your blog!).

To see whether this was a more general trend, I headed over to StatCounter and checked their global stats for U.S.-based web clients. As this is a more general userbase versus the early adopters we see on redmonk.com, you’d expect the trends to lag a bit. But let’s check out the data. First, here’s the graph for mobile vs desktop usage:

statcounter_usage

What we see is that mobile’s up, no surprise there — it increased from 4.0% to 7.5% since mid-2010. But the real question was whether this disproportionately affects certain desktop OSs over the same time frame:

os_usage

As you can see, Windows dropped by 4.6% (to 81.2% from 85.8%), while everything else held constant or increased (Macintosh went up by 2.8%), again supporting the trickling flow of Windows desktop use toward mobile devices. Since Microsoft’s still struggling to get traction as a competitive mobile OS, they should be seriously concerned about this threat to its core businesses of Windows and Office.

Of course, they’re already responding in the form of a touch-friendly Windows 8 and a continued push on Windows Phone releases, but things are not looking good at this point. One of Microsoft’s biggest assets is its huge ecosystem of developers running a full Microsoft stack. It should be doing everything it can to push them toward developing for Windows-based mobile devices, because a plethora of apps is a big draw for potential mobile buyers, and the lack thereof will stop them in their tracks.

Disclosure: Microsoft is a client, Apple and Google are not. Linux vendors Red Hat and Canonical are clients.

by-sa

Categories: mobile, operating-systems, windows.

What will I cover at RedMonk?

As promised, here’s some of the things I want to focus on. My interests are all over the map, so this is just a partial list:

  • Data & analytics. Whether the data’s big, small, or in-between, I love working with it to extract new insights and understanding how others are doing the same. I’m very interested in the ways people cope with and learn from huge amounts of data, whether it’s structured or unstructured. But even on the small scale, there’s a lot you can learn if you apply the right methods. My experience has given me the background in experimental design, data analysis, and statistics to do precisely that.
  • IT management, virtualization, cloud, & DevOps. During my 5 years in Oregon, I was closely involved with sysadmins at the Open Source Lab. The OSL hosts servers for major open-source projects including Gentoo Linux (prompting my relationship with them), the Apache Software Foundation, numerous other Linux distributions, and Kernel.org & the Linux Foundation. Interacting with the OSL folks on a daily basis gave me a deep interest in the technologies involved in managing and scaling servers.
  • Open-source software. I’ve spent the past 8 years working in OSS projects of all sizes, from 200+ (as mentioned) down to tens (e.g. X.Org) or even just a couple of people (burrow-owl, proteingeometry). Outside of those, I have a broader view of the free- and open-source software world through my participation in the FLOSS Foundations group. I’ve got an insider perspective on how open-source projects operate and look forward to applying that here.
  • Community. My positive and negative experiences leading Gentoo Linux and working with the Google Summer of Code have given me a great respect for the power of community and culture. It can make or break companies or open-source projects. For that reason, I plan to spend some time on the cultures of companies, OSS projects, and the communities around them.
  • Mobile. It’s pretty clear at this point that there’s an inevitable flow toward open standards and technologies, even within closed app ecosystems. Furthermore, we’re seeing a continuing convergence of more and more functions onto single mobile devices and a migration of functionality to more mobile places. I’m also interested in the re-use of the same technologies in multiple scales and niches (e.g. web technologies like JavaScript and HTML5 used in apps for Windows 8, GNOME 3, and iOS).

That should be enough to give you an idea of my interests, but if you think I might want to hear more about what you’re doing, please do get in touch regardless of whether it’s on the above list.

by-sa

Categories: analyst.

The littlest Monk

Hello, world! I’m thrilled to be joining RedMonk as its newest analyst. Since I could never fill Coté‘s shoes, thankfully I don’t need to; I brought some shoes of my own because RedMonk’s going in exciting new directions. I figured I’d kick off this blog by introducing myself, why I’m here, and some (but not all!) of my current interests as an analyst.

Who am I, and how did I end up doing this?

James and Steve have already provided ample (and over-the-top!) introductions. So you can understand the background I’m bringing as an analyst, I’ll fill in some of the blanks and the backstory. To sum things up, I’m a scientist and an open-source developer. I started purely in science, until one day in 2001 when I had to learn Linux to do simulations of a neurotransmitter called serotonin (PDF summary), which I was already studying with lasers. I got addicted to Linux, and from there it was just a matter of time — no matter how cool it was to work with lasers.

Lasers! (No sharks, though)

Turns out that shooting really powerful lasers at serotonin makes it glow purple. (Don't try this at home, kids.)

On the science front, I eventually jumped ship from lasers to X-rays. I worked on understanding the relationships between a protein‘s structure and its function using a number of methods, but the coolest one is called X-ray crystallography. Soon my love of computers became apparent, as I switched gears from doing my own lab work to deriving new insights from large-scale studies of data that already existed (see the pic below), in the form of protein structures solved by others. We then made the tools we created freely available so everyone could benefit. This was the equivalent of so-called “data science” in the world of biochemistry.This graph from my work happens to be about proteins, but it could just as well be any multi-variable dataset:

Using a technique called kernel regressions, we can create nice smooth trends even when the underlying data are pretty noisy. This graph uses color to display the occurrence of different types of protein structure (defined by Φ and Ψ).

My craving to make more of a real-world impact drove me to the Mayo Clinic, where I worked in early-stage drug discovery and continued building my expertise in dealing with large quantities of data. Our favorite technique was called fragment-based screening. We would start computational screening with hundreds of thousands or even millions of compounds, only to narrow it down to ~50 we wanted to test in the lab.

Discovering my passion for technology

At the same time, I became deeply involved in using and programming open-source software (OSS), first just in my free time but later incorporating it into my research. By 2003, I’d settled on Gentoo Linux as my distribution of choice (by way of Red Hat and FreeBSD), and I jumped in with both feet. I started using Gentoo in March, and by June I earned my developer privileges. At the time, I could hardly hack my way out of a paper bag because I lacked any training in computer science. I think I spent most of that summer working on a single package. Fast forward a few years, and you’ll get an idea of how slow of a start that truly was — by 2005, I’d taught myself enough to maintain a few hundred packages, and that was far from the only thing I was doing in Gentoo.

I soon became a leader in Gentoo, first as manager of its desktop project and later as one of the 7 members of its elected council, where I’m serving again after a brief hiatus to focus on science. Gentoo has 200–250 open-source contributors, so getting them all on the same page is no mean feat. A few summers back, I also took over Gentoo’s involvement in the Google Summer of Code (GSoC), an amazing program run by Google that pays college students to work on open-source projects. In GSoC, I oversee ~15 student-mentor pairs; basically, I train mentors, make sure things run smoothly, recruit students to become Gentoo developers, and put fires out.

I write, too!

Outside of science and OSS, my most relevant contribution is probably as a guest author for LWN.net — one of the best developer-targeted sources for open-source news. I was classically trained in journalism in college and spent 4 years working at various newspapers both at the college and professional levels, doing everything from writing to page design to copy editing. Having an opportunity to finally apply this training to my love for open-source development was an incredible stroke of fortune.

Convergence, and joining RedMonk

These three seemingly disparate threads of science, OSS, and journalism have grown increasingly interlinked over the years. At first, I’d wondered how I could possibly choose between my three loves, but somehow things started coming together in a way I’d almost describe as destiny. First it was science and writing, then I started bringing OSS code into my science, and finally I integrated writing into my OSS work with Gentoo and LWN.

Ever since I first discovered Steve 6 or 7 years ago as he was writing about Gentoo, I’ve thought he had the greatest job ever. I’ve closely followed his work over the years, because a lot of it applies quantitative methods to understand trends in technology and how they’re driven by developers, from the bottom up. As a scientist by training, I love quantitation so it was great to see similar methods applied to the IT industry instead of the usual hand-waving. When the opportunity opened up to join RedMonk, I couldn’t resist — between James’ and Steve’s desire to bring on a “data griot,” my admiration for their previous work, and my own diverse background in everything they wanted, it all fit together like the pieces of a jigsaw puzzle.

The hardest part? Making the decision to leave the life sciences behind, even though I love tech and the new directions I’m moving. Between the sheer investment of time and the difficulty of breaking back into academic science once you’ve left, it was nervewracking. But both on the macro- and micro-scales, every other factor made the decision easy — feel free to drop me a line if you’re in the same position. Once I applied, everything went smoothly.

I’ve always felt that the best way to get a job or promotion is to be doing the work already, and James agreed:

But even during the hiring process I was blown away by the fact Donnie just dovetailed with us. If I wrote a post Donnie commented. If Stephen tweeted about some data, Donnie explained how to normalise it. He friended me on every social network I use, and engaged. More than any other candidate Donnie just became a natural part of the team… before we even made the final decision to hire him.

I’m thrilled to be here and am really looking forward to interacting with all of you. In my next post, I’ll discuss my interests as an analyst.

by-sa

Categories: analyst.