I wrote the other day that the talk from Jeffconf that had really stuck with me was by Guy Podjarny. That’s actually wrong, I should have said, one of the talks that really stuck with me, because the pitch for Less is More by Ben Fletcher, Principal Engineer at the Financial Times, was fantastic. Ben gave the talk in sign language, something new for me at a tech conference, and very beautiful. The talk was clever and funny (I have never been called a “hearie” before), and contained the key serverless conference trope – saying you rely on Amazon Web Services (AWS) Lambda, but then criticising at least one core aspect of the AWS stack. Related – Ben’s talk was the first time I have heard of any organisation using Splunk to manage serverless functions, so that was certainly notable. Given my quick writeup of Guy’s talk was essentially about better code hygiene, this post makes a good companion piece.
The FT has undertaken a major reorg, and brought all of its web operations in house, with one team owning all aspects of the user request chain.. 35 Node.js developers develop and manage the site. One pleasing aspect of the team Ben has built is the gender mix – while it isn’t 50/50, the gender ratio is 2:1. Key technologies the FT relies on include AWS Lambda, Fastly for caching, Splunk and Grafana for logging and monitoring, and CircleCI for test automation.
Multivariate testing is built into the architecture – Ben laid out a nice experiment the team had built, in which it offered 3 options for it’s 50 curated section pages – chronological, editorially curated, and curated using AWS machine learning APIs.
“Which got the most viewers – machine or editorial? It was a draw”
I suspect Ben didn’t want to hurt the editorial team’s feelings. The chance of a draw across millions of pageviews seems pretty remote.
So what about core services. Ben is not a huge fan of AWS CloudWatch.
“When errors happen it’s a nightmare. Trapping an error through CloudWatch isn’t worth it.”
Instead the FT outputs CloudWatch logs to Splunk and Grafana for metrics and real time analysis. Ben gave an example of losing a page, where the team is 99% certain it’s a validation problem. With Splunk they have a confirmation in real time. The FT uses Fastly for failover – if there is an error the user just goes through to cached content. It can also purge cache instantly.
Development doesn’t use a pre-production environment but relies on local host.
“AWS is very rigid, delicate, hard to develop further once in production.”
Smoke tests are set up with CircleCI and AWS CloudFormation, with testing taking no longer than 5 minutes. Unique IDs allow for multiple builds from the same repo. CircleCI is also used for automatic nightly builds, though not automated deploys. The idea is to “make hygiene fun”. This is really critical – developers have to buy into testing approaches, and if you can make flossing fun people are far more likely do it.
The FT is managing what Ben claims is the fastest news site on the web. It’s a small team, doing high scale work enabled by AWS Lambda. It’s a solid case study. It’s always interesting to see the different ways people add value to their Lambda architectures. AWS has plenty of opportunity to back fill, and improve its monitoring and management tools. We’ll see more at Reinvent.
Why Serverless is crushing it right now.
Lambda Kicks in, a serverless world made of messages.
AWS is a client.
The incredible shrinking time to legacy. On Time to Suck as a metric for dev and ops - Enterprise Irregulars says:
July 19, 2017 at 11:25 am
[…] Ben Fletcher, Principal Engineer said in his excellent talk at Jeffconf […]