Skip to content

Enterprise tech: the still-hot old thing

Every time you turn around, you’re hearing about data science, DevOps, mobile-first, growth hackers, etc. But that doesn’t mean the existing footprint has disappeared — far from it, in fact. Recruiters continue to search in huge numbers to hire enterprise talent, not just for the latest generation of tech unicorns.

This week, LinkedIn released its annual report on the top skills recruiters search for. Prominent on the top 25 were enterprise stalwarts like:

  • Middleware and Integration Software
  • Storage Systems and Management
  • Business Intelligence
  • Java Development
  • SAP ERP Systems

Since recruiter interest links directly to the hiring market, it’s clear that companies continue to search for talent that you could expect to be ubiquitous at this point. This supports the more general assertion that developers as a whole are in shortage.

Consequently, I would argue that the tech industry needs to focus on training of existing tech-savvy folks who aren’t yet developers. I’ve run across quite a bit of anecdata about Salesforce admins who start as administrative assistants and become developers. More recently at Splunk .conf, I came across a Splunk admin who followed the same path by transitioning into a manager of a dev team.

Democratizing development is one thing, but equally important is remembering that it’s a funnel that enables you to bring some of those proto-developers farther down the road. You can’t wait around for the “pipeline” to fix itself starting in grade school.

Disclosure: Oracle (which owns the Java trademark),, SAP, and Splunk are clients. LinkedIn is not.


Categories: big-data, data-science, employment, marketing, salesforce.

Docker, Rocket, and bulls in a china shop

Quick backstory: Docker’s an incredibly popular container technology, and CoreOS built a cloud-native Linux distro around it.

CoreOS just announced a competing alternative to Docker called Rocket. Docker’s official response to the Rocket announcement was very telling, and surprising. It came less than 2 hours after the announcement went up, and it was packed with typos, defensiveness, and aggression.

The basic structure and meaning of the response, in my own words, is:

  • Docker has an enormous community — we own all the mindshare, implying that we’re clearly right.
  • We’re moving up the stack. Since we own the mindshare, this is the right thing to do by virtue of us doing it.
  • We love open source, we swear, although we’re definitely in the right because the majority of people are with us.
  • There’s some minuscule group of people (all vendors, apparently) who disagree with our moves. They must be wrong because we’re taking efforts to point out that they’re vendors and not users. (ad hominem, anyone?)
  • We’re going to imply that the reason Rocket exists isn’t technical or philosophical, by presenting that option as the final corner case (“of course”). Aim being to convince developers that Rocket is just some NIH thing that exists for no reason devs should care about.
  • In bold, at the very end, such as to be the take-home point of the whole post, is a line about “questionable rhetoric and timing”, followed by another implication that Docker Inc knows what’s best since it has this huge ecosystem.

It’s particularly easy to see when you compare the initial post to the current, updated version:

Docker Rocket response

What are the key differences?

  • A host of typos disappear. Their presence indicates this was rushed out the door very quickly. Why might that happen?
  • Emphasizing their commitment to the ecosystem, rather than solely the ecosystem’s commitment to them;
  • Clearly noting that Rocket’s raison d’être appears to be true technical or philosophical differences; and
  • Removing the bolding on the final paragraph, although the wording remains.

I’d interpret that as Docker’s leadership initially having a panicked knee-jerk reaction. Couple their post with Docker cofounder and CTO Solomon Hykes’ behavior on Twitter and on the Hacker News thread on the Rocket announcement (1, 2, 3, 4, 5), and you’ve got yourself a recipe for disaster.

My experiences with abusive behavior in Gentoo have led me to speak for years on the data and social-sciences research behind negative community interactions. One universally critical point is that you separate technical criticisms from emotional attacks, and Docker has failed to do so in this case. The Rocket announcement has some harsh words, no doubt about it. But taking them personally and then replying emotionally is exactly the wrong thing to do.

Responses from the community have largely been negative to Docker’s behavior throughout this process, with some exceptions:

This comes off as overly defensive and entitled, like “we brought you containers and you stab us in the back!?”

I don’t see why they need to view this as an opportunity to fight back and criticize another app container system, rather than enthusiasm about the continued spread of containers and expressing a desire to cooperate on building open, interoperable standards.

— themgt, December 1, 2014

In longer-form writeups, Daniel Compton had particularly insightful thoughts on the competitive landscape and moves among Docker Inc, CoreOS, Amazon, and Google that nicely complement my colleague Steve’s recent writeup on scale and integration. Matt Asay also wrote up a useful critique of Docker’s actions.

While Solomon would prefer to focus solely on the technology, unfortunately “Field of Dreams” approaches don’t work out so well in real life. Things like marketing, community management, and the barrier to entry really do matter. I’d strongly recommend to Solomon that in the future, he should stay out of any controversies like this, get himself some media training, and stick solely to technical arguments in public as long as he’s representing Docker Inc.

But he’s not alone — the formal statement from Docker was similarly out of touch with reality, in that it was very much focused on inside-out emotional reactions rather than the consequences they would have upon their existing and potential community.

Disclosure: CoreOS and Amazon Web Services are clients; Docker and Google are not.


Categories: cloud, community, devops, docker, open-source.

The reality of IoT today, not hype about 2019

There’s been a lot of hype around the Internet of Things in the past few years, with lots of people talking about wearables and so on. All kinds of fun stuff like smart watches (I own a Pebble myself) and Google Glass. But it’s all seemed very much in the early-adopter stages.

Here’s a graph of Google searches for the “Internet of Things,” and you can see the huge increase in interest in queries over the past year in particular.

Screen Shot 2014-10-03 at 4.25.10 PM

The biggest problem with IoT is understanding the difference between the hype and the reality. Everybody’s interested but it’s all lots of handwaving about how magical the future will be. However, I keep running across crazy stuff like windmills and cranes and airplanes that are all connected today, and saving millions of dollars for companies. That’s pretty serious, and by no means is it hype.

So we decided to run an event called IoT at Scale supported by SAP, which is one of those real companies doing real stuff, to help bring together the worlds of the trendy and the industrial. It’s not about SAP tech, they just happen to be really interested in the topic.

We’ll be digging into what’s actually happening today with IoT. It’s easy to hype it and talk about whatever trillion dollar markets, which can make you lose sight that there’s real and interesting tech doing real and important things today. It’s just in the business world rather than the consumer world, so we don’t normally think about it.

If you want to try out some real tech, and learn about things that have really been done in IoT at a deeply technical level, check out IoT at Scale. We’ll have a hackday and a day of talks, coming up soon on Oct 16-17 in Palo Alto. Next month, we’re gonna do a version of this in Berlin too.

Disclosure: SAP is a client.


Categories: internet-of-things.

IT must become a service provider, or die

The traditional role of IT departments is shifting, metamorphosing, even vanishing in some cases. It used to be that IT was the “department of no”. But a couple of decades ago, open source became a thing. Suddenly anyone could obtain world-class software without any license cost. Then a decade later, along came the cloud, in the form of SaaS companies like Salesforce as well as IaaS like AWS. With SaaS, anyone could sign up for a subscription-based purchase for a few bucks a month. Most people never did the math to understand what that looked like in the long term, but at least it fit within their purchase limits. With IaaS, anyone could now obtain the hardware as well, for a cost that fits within the typical developer’s expense budget for a single server.

Thus began shadow IT — people buying things that would’ve typically fallen under IT purview, but outside of its budget and control. Most ironically, in some cases shadow IT happened from within IT itself, as a rebellion against its own processes, budgets, and bureaucratic overhead. Before long SaaS and IaaS became dominant methods of procurement for new applications, and even grew existing share in so-called “brownfield” use as well.

However, most IT shops haven’t seriously considered the long-term implications. Departmental budgets coming from marketing and from lines of business are leaving IT, and over the course of a few years, this will transition to subtractions directly from IT’s budget. In other words, departmental budget dedication to IT becomes a voluntary contribution — they’ll put the money wherever it seems most useful, much like college tuition.

So IT must change. Here are two examples, from Mike Kail (Yahoo CIO formerly of Netflix) and from Facebook. In both cases, they’ve transformed the role of IT into a true service organization rather than a gatekeeper. In particular, note that they’ve moved toward approaches reminiscent of self-service (vending machines) and of Apple’s Genius Bar.

Yahoo IT, under new CIO Mike Kail (formerly Netflix CIO).

Yahoo IT, under new CIO Mike Kail (formerly Netflix CIO). Credits: Mike Kail

Facebook IT helpdesk, circa 2011. Credits: Facebook

Facebook IT helpdesk, circa 2011. Credits: Facebook

Even in “enterprise” level purchases, the role of IT is shifting. Consider the case of Solidfire. As they told me at their analyst day, their solid-state flash arrays start around $200K, and yet they’re adding REST APIs, and their customer base is shifting increasingly toward Fortune 500 IT shops rather than purely service providers.

That’s because IT is becoming an internal service provider in its own right, with the same competitive landscape that its external competitors face. The difference is that its mission must be to provide a lower barrier to entry. From the shadow IT buyer’s point of view, internal IT has the competitive advantage of avoiding much of the purchasing, infrastructure, and billing overhead that external vendors and outsourcers have. IT can transparently monitor to get what it needs while helping users avoid the burdens of registration and payment that they’re accustomed to with public cloud. This is an opportunity, so IT must seize the day.

The next step? That’s true integration with the business, and a focus on business value. But becoming a service provider is a vital step along the way.

Disclosure: and Solidfire are clients. AWS has been a client. Apple, Facebook, and Yahoo are not clients.


Categories: Uncategorized.

GitHub’s vanishing acceleration

In 2013, I successfully predicted GitHub’s growth from 3 million to 4 and 5 million users respectively, with sub-month accuracy.

This time around, my news is less cheerleading and much more concerning. As I began work to follow up on my growth predictions this year, the numbers stopped matching up. Using the old equation, I kept overestimating where GitHub’s user numbers would end up.

Finally I started looking into growth numbers on a monthly basis, and things got a little clearer. It looked like relative growth over previous months might have been slowing down, but the numbers jumped around so much it was hard to tell for sure. So I plotted it and used a fancy smoother called LOWESS, which is particularly good for nonparametric data (i.e. you don’t know what’s in it but want results anyway). Then it got crystal clear:

Methods: Data were acquired from GitHub search API, then LOWESS smoothed with a fraction of 0.5 and 3 iterations.

Methods: Data were acquired from GitHub search API, then LOWESS smoothed with a fraction of 0.5 and 3 iterations.

Although individual monthly data points are very noisy, there’s a clear downward trend over the longer term. Even varying some of the inputs for the LOWESS smoother didn’t change things in a meaningful way. Since GitHub started, it’s been growing a little bit slower (on a percentage basis) every month, even though its userbase is nearing 7.5 million. More explicitly: every month in 2008 got around 10% more new users than the previous month. By late 2014, on the other hand, every month has roughly the same average number of new users.

GitHub has reached an inflection point

Yesterday on Twitter I was talking about inflection points because they’re surprisingly misunderstood, and Ed Saipetch pointed to this excellent visualization of what they are. One example is when your growth begins to plateau, which is indicated by a slower velocity every month. GitHub’s still growing — don’t get confused about that. But it’s not growing as fast as it used to, and if continued, this will cause its growth to trail off well before I’d predicted.

The other type of inflection point is the one that GitHub needs to target next: shifting back from neutral or deceleration into acceleration mode again.

Moving beyond the plateau, or dodging it entirely

To regain its acceleration, GitHub has many options. Although I’m not going to be exhaustive, let’s dig into a few of them.

It can provide better offerings to existing audiences, who have stopped signing up in the same “exponential growth”-style numbers that it’s become accustomed to. Increased investment in GitHub Enterprise is one way to go about this, for example through partnerships with current giants who don’t have competitive offerings, or whose customers are requesting GitHub anyway. Embedding GitHub into tooling, whether it’s developer-facing or a backend for an office suite, whether for internal or external use at a company, is another way to advance its position.

The GitHub team could also choose to focus on competitive barriers, trying to make it increasingly easy to migrate code in and increasingly difficult to migrate code out. It could take a page from Michael Porter’s five forces and move up and down the supplier stack, while simultaneously targeting competitors (largely proprietary, as well as entrenched open-source options like CVS and Subversion) and substitutes, like ignoring version control altogether.

Another turnaround strategy is outreach to entirely new audiences — e.g. turning GitHub into a platform play rather than a version-control system for developers. Take for example the movements around pulling lawyers and legal code, or data journalists, onto GitHub. Or GitBook for authoring as another.

What’s missing, in many cases, is that platforms are rarely successful without applications — and enough of them to paint a picture of the platform’s potential. GitHub needs to invest in creating more applications for non-coders to make this type of platform play a success. Perhaps GitHub’s Atom editor or Team collaboration app could prove a useful core.

As noted by Ian Bull, geography is another approach to untapped audiences — GitHub search shows around 25K users who report living in China and 23K in India. While likely underreported, especially in China, it’s a nonzero number but clearly has huge amounts of room for growth.

Regardless of the method GitHub chooses, hitting the plateau is inevitable without significant changes in direction.

Disclosure: GitHub has been a client.


Categories: adoption, github, packaging.

Reference architectures belong in code, not pointless PDFs

Every couple of weeks, I get emails about a new reference architecture for something or other, from any one of an endless list of vendors. I inevitably click through to see what they’re talking about, and it’s almost always something hidden behind a registration wall. Every once in a while I’m sufficiently curious to fill out the form, and almost universally I end up getting force-fed a PDF whitepaper.

This is completely the wrong model. We’ve been talking about the importance of the barrier to entry for many years, and a PDF writeup and illustration of a reference architecture is a perfect example of that.

The problem is that the distribution model hasn’t changed with the times. As I sit here at VMworld, I’m hearing about how we now have ubiquitous virtual machines and containers, and we have infrastructure as code a la Puppet and Chef. Yet this stuff is still shipped in the same way it’s been shipped for decades, in a form meant to be laboriously translated from illustration into infrastructure, replicated across every single consumer of the architecture.

Why aren’t we shipping reference architectures as code samples? Even dead-tree programming books have been doing this for years. We now have the technology to ship even multi-server descriptions of IT infrastructure, so let’s do it.

Disclosure: Chef is a client. Puppet Labs has been a client. Docker and Google are not clients.


Categories: devops, packaging, virtualization.

Kingmakers in the enterprise

This was such an amazing email that I had to share some of it with you. It’s from one of our end-user clients in the insurance industry, who I recently spent a day working with.

We’re seeing growing traction in what I’d call forward-leaning enterprises — companies interested in the business value of technology. They’re excited about IT as a function that collaborates with the business to drive returns rather than as a cost center, a la Phoenix Project. Here’s one example:

I’ve always expected that hearing from RedMonk would serve as a catalyst for discussion and action.  I’ve seen and heard more active discussion from our business leaders down through our dev teams and those in ops than I have heard out of any other discussions we have had on any topic.

I’m seeing excitement and moves to action.  The discussions brought together many smaller discussions and thoughts I think several have had.  You really were able to crystallize what we were seeing and where we wanted to go.

Hands down the best dollars I have ever spent.

Boom. Developers are the new kingmakers, both inside the Bay Area bubble as well as far beyond it.


Categories: Uncategorized.

Microsoft goes after the barrier to entry for data science with Azure ML

A month ago, I got a pre-briefing on Microsoft’s Azure Machine Learning with Roger Barga (group program manager, machine learning) and Joseph Sirosh (CVP, Machine Learning). Yesterday, Microsoft made it available to customers and partners, so now seems like the right time to talk about how it fits into the broader market.

The TL;DR is that I’m quite impressed by the story and demo Microsoft showed around machine learning. They’ve paid attention to the need for simplicity while enabling the flexibility that any serious developer or data scientist will want.

Here’s an example of a slide from their briefing, which obviously resonates with us here at RedMonk:

Machine Learning Briefing June 2014_p3

For example, we constantly hear about toolsets like Apache Mahout (for Hadoop) that it’s more of a prototype than anything you can actually put into production. You need to have a deep knowledge of machine learning to get things up and running, whereas Microsoft’s making the effort to curate solid algorithms. This makes for a nice overlap between Microsoft product and research, the latter of which  has some outstanding examples of machine learning (such as the real-time translation from English to Chinese in late 2012 by Rick Rashid).

In action, Azure ML looks a lot like Yahoo Pipes for data science. You plug in sources and sinks, without thinking too much about how that all happens. The main expertise needed seems to be around two areas

  1. (Largely glossed over) Cleaning the data before working with it
  2. Choosing an algorithm that makes sense given your data and assumptions

Both of these require expertise in machine learning, and I’m not yet sure how Microsoft plans to get around that. Their target market, as described to me, is “emerging data scientists” coming out of universities and bootcamps. Somewhere between the experts and the data analysts who spend all day long doing SQL queries and data modeling. Some comparisons of data against various distributions to check the best fit and whether that suits the chosen algorithm would be one approach; another would be preference of nonparametric algorithms.

Here’s a screenshot of a pipeline:

Machine Learning Briefing June 2014_p4screen

From my point of view, a critical feature to any pipeline like this is flexibility. Microsoft’s never going to provide every algorithm of interest. The best they can hope for is to get the 80% of common use cases; however there’s no guarantee that even the 80% is the same 80% across every customer and use case. That’s why flexibility is vital to tools like this, even when they’re trying to democratize a complex problem domain.

That’s why I was thrilled to hear them describe the flexibility in the platform:

  • You can create custom data ingress/egress modules
  • You can apply arbitrary R operations for data transformation
  • You can upload custom R packages
  • You can eventually productionize models through the machine-learning API

All of this, except for the one-off R operations, will rely on the machine-learning SDK:

Machine Learning Briefing June 2014_p10

Much like higher-level AWS services such as Elastic Beanstalk, you don’t pay for the stack, you pay for the underlying resources consumed. In other words, you don’t pay to set up the job, you pay when you click run.

Microsoft’s got a solid product offering here. They need to figure out how to tell the right stories to the right audiences about ease of use and flexibility, build broader appeal to both forward-leaning and enterprise audiences, and continue to focus on constructing a larger data-science offering on Azure and on Windows (including partners like Hortonworks). They also need to continue reaching toward openness, as they’ve shown with things like Linux IaaS support and Node.js support. One example would be Python, an increasingly popular language for data science.

Disclosure: Microsoft and AWS have been clients. Hortonworks is not.


Categories: adoption, big-data, data-science, microsoft.

Widespread correlations across programming-language rankings

IEEE Spectrum recently came out with a very interesting interactive tool for ranking programming languages. What makes it interesting is that it incorporates 12 different sources including data from code, jobs, conversation, and searches — and you can customize the weights assigned to each source.


But the first thing that occurred to me was, this is a fantastic opportunity to look at commonalities and communities across all of these sources. That could tell us about which places could provide unique insight into what technologies developers care about and use, and which provide mainly reinforcement of others.

Before I did anything, however, I wanted to test the veracity of the rankings. So I compared RedMonk’s January rankings against an equal weighting of GitHub active repositories and StackOverflow questions. While not perfectly correlated, since IEEE used only 2013 and RedMonk uses all-time, the Pearson correlation coefficient for the top 20 languages is 0.97 (where 1 would be entirely correlated).

Having confidence in their data and reinforcing RedMonk’s rankings, I moved on to calculate, using the full 49 languages supplied by IEEE, correlations across every data source they provided:

  • CareerBuilder
  • Dice
  • GitHub active projects
  • GitHub created projects
  • Google search (# of results)
  • Google trends (search volume)
  • Hacker News
  • IEEE Xplore (IEEE articles mentioning a language)
  • Reddit
  • StackOverflow questions
  • StackOverflow views
  • Topsy (Twitter search results)

Here’s a spreadsheet showing the numbers, where higher correlations are in red and very weak correlations are in blue:

The strongest correlation on the chart, interestingly, is the 0.92 found between Twitter conversation and Google trends. Apparently, people talking about programming languages in real-time chat tend to also search for what they’re talking about.

The other very strong correlations (above 0.85) are:

  • Google: trends and search. Nothing surprising here.
  • Job sites: Dice and CareerBuilder. Nothing surprising.
  • Reddit and Google trends. Discussion about current topics seems to correlate with interest in finding more information about those topics.
  • Twitter and Google search. The 0.88 here is slightly below the 0.92 between Twitter and Google trends. Most interesting about this pair is that it shows a connection between conversation and amount of content (# of results), rather than just people searching for what could be a small amount of material.
  • Reddit and Twitter. Similar communities seem to participate across a wide variety of online discussion forums.
  • GitHub created and StackOverflow questions. Because it’s a correlation of open-source usage and broader conversation among forward-leaning communities, this is the one we rely upon for the RedMonk language rankings.

Midrange correlations : Hacker News and IEEE Xplore

In the middle (correlations between 0.3–0.7), I was surprised that Hacker News correlated rather weakly with all of the other sources. This implies a degree of independence for this community relative to the behavior of all global developers, and even the subset who participate on StackOverflow. It’s certainly some interesting data to support the saying that HN is for Bay Area developers (and their bleeding-edge “cousins” across the world).

IEEE Xplore, which is oriented around academic research, had similarly weak correlations with everything else (HN included). This supports a general disconnect between academia and both general trends (most other sources) as well as forward-leaning communities like HN.

Both of these seem to make sense based on my prior expectations, since both of these groups are rather unlike the rest.

StackOverflow viewers are the outliers

The weakest correlations were between StackOverflow views and almost everything else. It’s shocking how different the visitors to StackOverflow seem from every other data source. If we actually take a look at the top 20 languages based on StackOverflow views, it bears out the unusual nature that the poor correlations suggested:

  1. Arduino
  2. VHDL
  3. Visual Basic
  4. ASP.NET
  5. Verilog
  6. Shell
  7. HTML
  8. Delphi
  9. Objective-C
  10. SQL
  11. Cobol
  12. Apex Code
  13. ABAP
  14. CoffeeScript
  15. Go
  16. MATLAB
  17. Assembly
  18. C++
  19. C
  20. Scala

Three of the top 5 are hardware (Arduino, VHDL, Verilog), supporting a strong audience of embedded developers. Outside of StackOverflow views, these languages are nonexistent in the top 10 with only two exceptions: Arduino is #7 on Reddit and VHDL is #8 in IEEE Xplor. That paints a very clear contrast between this group and everyone else, and perhaps a unique source of data about trends in embedded development.

Enterprise stalwarts are also commonplace, such as Visual Basic, Cobol, Apex (’s language), and ABAP (SAP’s language). Other than this:

  • Visual Basic is only in the top 10 in Google
  • Cobol and Apex are only in the top 20 on career sites (in the high teens)
  • ABAP is only in the top 20 on career sites and Google search (in the high teens)

Again, StackOverflow views may be a unique source of information on an otherwise hard-to-find community.

Viewing correlations as a network graph reveals communities

However, this only lets us easily look at two-way correlations. If we want to see communities, it could be easier to examine this with a graph, with the connecting edges being the correlations between pairs of data sources. Here’s a visualization of that, only showing strong correlations (above 0.7), and with highly connected nodes shown in red while poorly connected nodes are increasingly blue.


Graph layout weighted by correlation across data sources, using a force-directed layout in Gephi. I used a 0.7 minimum threshold for the Pearson correlation coefficient.

It’s instantly apparent that some data sources serve as centerpieces that can broadly represent a swathe of communities while others are weakly connected and could provide more unique insight. In particular, note that IEEE Xplore and SO views are missing altogether because they had no correlations above 0.7 to anything else.

The most central and strongly connected node, perhaps surprisingly, is Twitter. Google is close by, however, which supports the validity of the oft-maligned TIOBE rankings to represent many communities. However it could be a better choice on their part to use Google trends over search results, based on the strength and number of connections shown above.

On the opposite side, being nearly unrepresented without explicitly adding them in, are the two that didn’t appear (StackOverflow views and IEEE Xplore). In addition, largely disconnected sources would be well worth considering to provide additional diversity. On this graph, they’re weakly connected (more blue) and less strongly correlated with their connections (thinner edges) — sources like GitHub active projects and Hacker News.


Based on that, I thought I’d recalculate a new set of rankings that accounted for these connections. I decided to include Topsy (weight 100), StackOverflow views (weight 100), Hacker News (weight 50), and IEEE Xplor (weight 50) to represent the diversity across these communities. These communities are vastly different sizes, so this truly reflects source diversity rather than population-level interest.  But it’s interesting to see interest scaled by community rather than by pure population:

  1. C
  2. C++
  3. Python
  4. Java
  5. SQL
  6. Arduino
  7. C#
  8. Go
  9. Visual Basic
  10. Ruby
  11. Assembly
  12. R
  13. Shell
  14. HTML
  15. MATLAB
  16. Objective-C
  17. PHP
  18. Scala
  19. Perl
  20. JavaScript

In comparison to the RedMonk top 20, the changes are about what you’d expect based on the earlier results. Languages more popular in niche communities tend to move up (e.g. Arduino, Go) because of how I weighted the outlier sources, while languages that aren’t popular across all those audience types (e.g. JavaScript, PHP) shifted downwards

This work revealed a widespread network of communities spread across a wide variety of forums, including code, discussion, jobs, and searches. Some of the most interesting results were the exceptions from the norm — in particular, StackOverflow views could provide a unique window into embedded and enterprise audiences, while Hacker News and IEEE Xplore are other sources with quite disparate data relative to the majority of the group. Finally, the connection between real-time conversation on Twitter and existing content on Google was a newly interesting correlation between discussion and resources that actually exist, rather than purely discussion and interest.

Disclosure: SAP and are clients. Microsoft has been a client.


Categories: adoption, community, programming-languages.

Microservices and the migrating Unix philosophy

A core Unix tenet pioneered by Ken Thompson was its philosophy of one tool, one job. As described by Wikipedia:

The Unix philosophy emphasizes building short, simple, clear, modular, and extendable code that can be easily maintained and repurposed by developers other than its creators. The philosophy is based on composable (as opposed to contextual) design.

This philosophy was most clearly visible through the existence of a substantial set of small tools designed to accept input and output such that they could be chained together in a series using pipes (|) a la `cat file | sed | tail`. Other instantiations include the “everything is a file” mentality and the near-universal use of plain text as a communication format. Both of these encouraged the sharing of a common toolset for accessing and processing data of all types, regardless of its source.

Following up on Steve’s writeup on microservices last week, I figured I’d better get this post out the door. I’ve had the ideas on the back burner for a year or so, but the burgeoning interest in microservices means now is the right time to tell this story.

The “new” composable world, same as the old one

Composability has made a resurgence in the past couple of years, inspired in part by the now-infamous 2011 post by Steve Yegge. It described Amazon’s move to a service-oriented organization where all data between teams must be transferred via API rather than emailing around Excel spreadsheets.

We’ve seen this pervade through to the design of AWS itself, and the ability of Amazon to keep up an astonishing pace of feature releases in AWS. More recently, the PaaS community, incited by the cries of Warner’s Jonathan Murray for a composable enterprise, has begun talking specifically about the virtues of composability (although it’s enabled it implicitly for much longer).

Another area where composability’s had a huge impact is IT monitoring. The ELK stack of Elasticsearch, Logstash, and Kibana as well as the #monitoringsucks/#monitoringlove movements (see Jason Dixon’s Monitorama conferences) serve to define the new composable monitoring infrastructure. It exists as a refutation of the old-style monolithic approach best embodied by the Big Four of HP, BMC, IBM, and CA. This movement further refutes the last revolution led by the still-dominant open-source alternative, Nagios, and the kingmakers-style bottom-up approach that enabled Splunk’s success.

Composability embodies the Unix philosophy that I began this piece by describing, and we’re now seeing it move up the software stack from its advent in Unix 40+ years ago at Bell Labs.

Granularity collapses when unneeded

The key point I want to make in this piece, however, is that composability does not and can not exist everywhere simultaneously. It just won’t scale. Although the flexibility that a composable infrastructure provides is vital during times of rapid innovation, such that pieces can be mixed and matched as desired, it also sticks users with a heavy burden when it’s unneeded.

As developer momentum and interest continues to concentrate up the stack toward cloud, containers like Docker, and PaaS, and away from concerns about the underlying system, that lower-level system tends to re-congeal into a monolith rather than remaining composable.

We’ve seen this happen in a number of instances. One prime example is in the Linux base system today, where systemd is gradually taking over an increasing level of responsibility across jobs formerly owned by init systems, device managers, cron daemons, and loggers. Only the first of those has seen significant reinvention in the last decade or so, with alternatives to the old-school SysV init system cropping up including Upstart, OpenRC, and systemd. But with systemd’s gradual integration both horizontally and vertically into the kernel and the GNOME desktop environment, it’s quickly becoming mandatory if you want one option that works everywhere.

Even beyond that, the advent of container technologies and distributions like CoreOS mean that users care increasingly less about the underlying system and just want it served to them as a working blob they can ignore. This is a similar driver to what Red Hat’s doing with CentOS, by attempting to provide a stable underlying firmament that you treat essentially as a large blob to build applications upon.

Another example is in X.Org, the primary Unix window system. Ten years ago, it was undergoing a period of rapid innovation driven in part by its recent fork from XFree86. The entire monolithic codebase was modularized into hundreds of separate applications, libraries, drivers, and the server. But now that community has realized it’s difficult to maintain so many stable APIs and the cost is no longer worth the benefit, so it’s considering bringing drivers and server back together into a mini-monolith of sorts.

Think of it as an accordion. Parts of it expand when there’s rapid innovation underway, often driven by external forces like the advent of the cloud, and then contract again when consensus is generally reached on the right solution and the market’s settled down. Then another part of the accordion expands, and so ad infinitum.

Disclosure: Amazon, IBM, Pivotal, Splunk, and Red Hat are clients. Microsoft, HP, and CA have been clients. Docker, Elasticsearch, CoreOS, Nagios, BMC, and Warner Music Group are not clients.


Categories: api, services.