— alex benik (@abenik) March 15, 2016
“Applied science is a discipline of science that applies existing scientific knowledge to develop more practical applications, like technology or inventions.” – Wikipedia.
I was really pleased last week when Fintan dropped this post The Welcome Return of Research Papers to Software Craft. At RedMonk we’ve been talking about the trend for a while so it’s good to see the idea captured as a post.
“Over the last two to three years a small, but very important and noticeable trend, has started around the world – a growing appreciation of the importance of primary research and academic papers among software practitioners. Those that are crafting software are spending more and more time understanding, learning from, and reflecting on research from the past and present.”
The post is really good. You should read it in full. But here’s a bit more before I jump in
The level of practitioner interest in research papers has risen to a point that the opening keynote atQConLondon this week was delivered by Adrian Coyler, author of The Morning Paper and a venture partner at Accel. Adrian walked people through a number of his favourite papers and challenged people to think a little differently about what is coming in the future.
It is hard to pinpoint quite what caused this renewed interest, but it is safe to say that the emergence ofPapers We Love, with the associated meetup groups, frequent discussions on forums such as Hacker Newsand blogs such as Adrian’s has created a wonderfully curated entry point to research papers for the curious. When people such as Werner Vogels at AWS remind us of the importance of papers, people sit up and take notice. As an industry we have had, at times, a tendency to forget to look at problems in detail, and instead focus on the quickest time to getting a product out the door.
One of the most recent Papers We Love talks came from Bryan Cantrill, CTO of Joyent, where he talked about BSD Jails and Solaris Zones, and as he noted at the start of his talk, while reminiscing about soda at the journal club his Dad, a physician, hosted:
“I always felt that it was really interesting that medical practitioners did this, and I always try to draw lessons from other domains, and medicine is very good about this, and we are not very good about this. We in computer science and software engineering are not nearly as good about reading work, about discussing it.”
This. I ran a conference on exactly this theme a couple of years ago with Monki Gras 2014: Sharing Craft. My thesis is that as an industry we’re actually improving in how we learn and share across disciplines but there is a lot of work to be done. When explaining the current state of tech Netflix pretty much always features because of its leadership in multiple areas. It pays above market rate to technical staff as a matter of course, for example, in order that it only attracts top talent. Netflix has crystallized the new way of working at scale, a way of working with intellectual property not in theory but in open practice. Netflix is in effect applied science. It carries out experiments in computing at scale in order to drive the business forward. It then open sources the code it used, rinses and repeats. Netflix doesn’t theorise, file patents and then ring fence its work. On the contrary it open sources code in order to drive forward the state of the art.
[BONUS UPDATE. following an @acolyler link this morning I discovered that Netflix had made all of this utterly explicit in a talk at QCon recently: Monkeys in Lab Coats: Applying failure testing research @Netflix. I don’t believe the video is up yet but can’t wait to see it)
Industry and academia need each other. Far from the tire fires of production, university researchers have the time to ask big questions. Sometimes they get lucky and obtain answers that change how we think about large-scale systems! But detached from real world constraints, systems research in academia risks irrelevance: inventing and solving imaginary problems. Industry owns the data, the workloads and the know-how to realize large-scale infrastructures. They want answers to the big questions, but often fear the risks associated with research. Academics, for their part, seek real-world validation of their ideas, but are often unwilling to adapt their “beautiful” models to the gritty realities of production deployments. Collaborations between industry and academia — despite their deep interdependence — are rare.
In this talk, we present our experience: a fruitful industry/academic collaboration. We describe how a “big idea” — lineage-driven fault injection — evolved from a theoretical model into an automated failure testing system that leverages Netflix’s state-of-the-art fault injection and tracing infrastructure
Netflix isn’t alone in this approach, of course. It’s how smart companies get things done. Stephen has written extensively about the Rise and Fall of the Commercial Software Market, and the stages our industry has gone through. Software is no longer the product, but increasingly a by-product.
We’ve made significant progress since Google, rather than open sourcing its own code, published the MapReduce Paper in 2004. Yahoo got Doug Cutting to build its own implementation, Hadoop, which it did open source, and the rest is history. Here are 5 Google Projects that Changed Big Data forever. The Research at Google site is a thing of beauty.
Ever since Google was born in Stanford’s Computer Science department, the company has valued and maintained strong relations with universities and research institutes. In order to foster these relationships, we run a variety of programs that provide funding and resources to the academic and external research community. Google, in turn, learns from the community through exposure to a broader set of ideas and approaches.
But by 2015 Google realised that open sourcing the code itself, rather than just publishing papers about its approaches, made sense. Why watch somebody else create another Hadoop or Mesos when Google could build a community around stuff it actually built – and so Kubernetes was born. Things got really interesting when Google’s engineers met engineers at Red Hat they deeply respected. When we write the history of Google this will be seen as a seminal moment, when the appliance of science became properly a community-based activity. The decision to open source some of Google’s core machine learning technology – TensorFlow – followed naturally on the obvious and growing success of a better, more collaborative model for applied science.
So Netflix and Google do it. Twitter definitely does it. Apple got the memo and open sourced Swift. Facebook crowed about the success of React in 2015. Uber and Lyft are both adopting the model. Pivotal is picking up code like NetflixOSS and OpenZipkin from Twitter for distributed tracing. You can bet someone at one of the tech giants is currently reading this paper, Message-Passing Concurrency for Scalable, Stateful, Reconfigurable Middleware and considering its implications. Oh look – it’s science as code, check out Kompics on Github. Maybe we should check it out in production. Let’s not forget that Linux began life in academia. And oh yeah Walmart… is making distributed systems contributions too.
Github, just mentioned, is a fundamental building block of the new applied science. The combination of open source, social coding (a little GIT thrown in for forking and testing and recombining) has utterly changed the game in software and distributed systems. There is no advantage in proprietary approaches – only advancing the state of the art. Well Amazon might argue, but we’ll see.
Open, practical innovation isn’t just a software phenomenon – check out Facebook’s Open Compute project, which implements some computer science fundamentals. There is a reason Peter Winzer, Ph.D and Head of the Optical Transmissions Systems and Networks Research Department at Bell Labs gave a talk at its most recent meeting.
Obviously I need to be a little bit careful about Golden Era thinking, but the applied science approach of cloud technology, with associated information sharing, is so very different from other spaces, in which science seems to become ever more commodified, but not commoditised. Pharma for example wants government to fund all the research, while it keeps all the profits. Companies are trying quite successfully to make genetics private science – patenting genes that occur in the wild, with terrible implications if you have a marker for say, breast cancer. The very foundations of science are being privatised. Researchers try to prevent others from replicating their work, rather than hoping for replicated experiments. It makes no sense. Tech however is showing us showing us something important about how to advance the state of the art, and that’s good for all of us. Not everything is perfect in tech, and the Industry finds ways to harvest data that should be public (or should that be private) but at least in distributed systems something very very interesting is happening. The Appliance of Science.