As newsletter subscribers are aware, Episode 1 of my new podcast Hark, “Election Night,” is here and ready for your listening pleasure. In this month’s episode, Jeremy Bowers, a developer on the New York Times Interactive News desk (which is hiring, incidentally), takes us inside the Times’ newsroom on Election Night. From the technical stack to the schedule that night to what the favorite catering choices are, Jeremy provides a behind the scenes look into what life is like for the developers on the other side of the website you try to bring to its knees by hitting refreshing over and over and over. And possibly swearing at. Jeremy and I also talk programming-as-journalism, AWS, WebSockets, how to get into elections as a technologist and more, so give it a listen.
For those of you who don’t do podcasts, however, we offer a transcription of the episode below. Enjoy.
Stephen: Well, excellent. Jeremy Bowers, welcome to Hark. If it’s okay with you, I’d like to start with a couple of basics. Who are you, and what do you do?
Jeremy: Sure. It’s really good to be here. My name is Jeremy Bowers, and I work for the interactive news team at The New York Times. We’re a weird little collective of programmers that sit in the newsroom and work really closely with reporters and editors on projects that sort of fall in the gaps between our graphics desk that does lots of charts and maps and the paper’s standard IT staff that would build a lot of platforms and our CMS, for example.
There’s a lot of these sort of Big Data projects, Olympics, elections, World Cup, that sort of fall in between the skill sets of these other groups. That’s where our team comes in. We jokingly refer to ourselves as Navy SEALS. Someone’s got to fix these things. Someone’s got to make them work, and that’s what we do.
Stephen: Nice. With getting into political news, elections seem to be something of a specialty for you. Was that a conscious plan, or is that something you kind of fell into? If I go through your background, beginning at the St. Petersburg Times to NPR and now with The Times, from the outside looking in, it seems as if your career followed a plan. Is that an accurate statement?
Jeremy: No, not at all. The best part about it is it’s a demonstration of just how the incentives for a lot of newsroom programming teams work. Elections are like a particularly perfect case where it’s just a little too much data for most people’s graphics desks to handle all alone. Though, honestly, The Times’ graphic desk could probably do this in a heartbeat. They just need a janitor to help them clean up. It’s one of those projects where, because if you have programming skills and a little bit of journalism sensibility, you will end up working on that project everywhere that you work.
This will be my fourth news organization which I’ve worked on a general election. I did one at the St. Pete Times, one at the Washington Post, one at NPR, and then 2016 here.
Stephen: Nice. So the one at St. Pete, was that the first election you worked on?
Jeremy: It was, 2008, with a young gentleman named Matthew Waite. I don’t remember very much about it. I was working on it like part time, because I was also working on blogs. My first job was as a blog administrator writing PHP and Perl templates for trashy old Movable Type and pre-WordPress blogs. It was not good. There was a better world out there. There was a world in which we were pulling down CSVs and running Python scripts against them. It just looked so cool, and I really wanted to do that pretty badly.
Stephen: One of the things that comes up in your background as we look through it is that you’ve sort of been immersed in blending data-driven analysis and news coverage. The idea of programming is journalism has trended in recent years. What’s your take on that? Programming is journalism, is that a real thing? Where do you think we are in that process?
Jeremy: There’s really two parallel thoughts that I have on this. The first thought is that there’s definitely some beats where if you’re not a programmer, you’re going to have difficulty keeping up with places that have a programmer helping you cover that beat. A good example of this would be campaign finance. We’ve had campaign finance laws since Watergate, which forced PACs and campaign committees to release their data either quarterly or every month. But if you were a reporter working on campaign finance in the ’80s or ’90s, you had days to write your story.
You could just leaf through the filings. It would take you really a long time, and you may not be able to find really useful stories, because people are really bad at finding small changes in data over time. Computers are super good at that.
Campaign finance is one of these examples where if you are a programmer, you can write scripts that will do aggregates, that will do comparisons, and it will let you write a story that’s just better than the story that you would have written before. We’re not even talking about something like a chart or a graph that’s a nice visual presentation. That’s just the standard story that you are going to write. You can just write that story better if you have programming skills.
That’s one part, is beats that require a programmer. Then, there are other things where it’s an older beat, but programmers could make it better. I like to think about things like police reporting or places where there aren’t data sets already, things like drone strikes or Guantanamo detainees, things where having a more structured understanding of how the world works, accumulating data sets that maybe don’t already exist can be a form of reporting in their own right. In particular, I really enjoy those.
Our team at The Times maintains this database of people who’ve been detained at Guantanamo, and I just don’t know of anything else that’s quite like it. It’s a fascinating data set and a really neat project. It only exists because someone bothered to sit down. Marco Williams and a team of interactive news developers sat down and decided to start tracking this and make a website out of it.
Stephen: That’s interesting. Certainly as someone in the industry, I’ve always found this fascinating in going back to some of the early days with sites like chicagocrime.org by Adrian Holovaty and basically taking raw crime dumps, putting them on a map, making them interactive, and making them useful in ways that we haven’t seen before.
I’m curious, though, because you hear different things about the traction behind sites like FiveThirtyEight in terms of trying to cover things other than elections. As somebody who does this work, do you think it’s something that’s more outlet driven, or do you think it’s something that is in demand from the audience? Is this something that is actually in…is it a pull thing, or is it a push thing, do you think?
Jeremy: Yeah, that’s actually a really good question. My personal bias on this feeling is that people aren’t in it for any particular form or presentation, but they’re in it for a story or some information, right? Our audience has always wanted to know things about the election, and they would love to have a better take on it. If the better take happens to be more visual or if we can tell them a story that we only can tell well in audio, then we probably ought to tell it on audio. If we have a story that comes out better as a chart, then we probably ought to tell them that story as a chart.
The thing I think that we miss if we don’t do data journalism particularly well is that we miss out on the stories that are told better when they’re told less anecdotally and with more rigor. That doesn’t mean that every story that we tell ought to be that way. There are many stories that are just much better as anecdotes. Some stories are good as a good mix. I really am fascinated at trying to find places where there’s an intersection of good story and anecdote but also lots of good structured data.
Stephen: Yeah, it’s interesting, because it’s one of those things that I…for example, with baseball. I’m a big baseball fan. I consume a lot of the analytical content and so on. The interesting thing, though, is that some of the writers, I think, get too far removed from the actual reader. Even as somebody of an analytical bent and somebody who’s technical, when you begin just throwing up charts because you have charts, you kind of lose the plot at times. Is that something you guys worry about? How much input do you have from the editorial side in terms of trying to make sure, “Hey, look. Let’s not just throw numbers up. Let’s try to make sure this is a story”?
Jeremy: Entirely. This is sort of the problem, right? Is that it’s so easy if you’re doing data journalism to stumble into a pile of data and then say, “I will put this up on the Internet for someone to explore.” The truth is, that’s the reporter’s job, is to explore and find a story and then to tell the story, not just to put it up and let people struggle through it.
The other thing that strikes me about that is that I don’t think there’s a competition between anecdotal and structured story telling. There are really great stories, like listening to Vin Scully talk about baseball. I don’t want to hear Vin Scully talk about someone’s wins above replacement. I’m sure he would be great, but he has a strength, and he should stick to his strength.
There are other people like Harry Pavlidis working on whatever random model he’s working on. I love to read about that too, but all of it should tell me a story. That stuff about catcher framing I feel like it’s some of the best analysis that I’ve seen lately, because it basically told you more about baseball. You learned all the stuff that you already sort of suspected, right, is that catcher is really valuable. Game calling is valuable. Turning balls into strikes is valuable.
But there’s something more than just that. It’s being quantitative about it, being able to say, “This is how much more valuable that player is than they used to be.” It opens up a ton of room for a standard reporter to just go out and ask a bunch of questions. Nobody was going to ask Matt LeCroy about how he framed pitches beforehand, because no one realized that it was super important, you know?
Stephen: Yeah, yeah. All right, let’s get back to The New York Times then.
Stephen: When you look at elections, what is the agenda? How do you drive evidence or facts based reporting into coverage of an election? Where do you get the raw data? How does the process work?
Jeremy: Yeah, absolutely. We get our real time data from the Associated Press, who maintain a huge stack of reporters and stringers who go to precincts all around the country and will report back data that they are watching officials hand enter. As a result, we get the data from the AP much more quickly than we would get it from the Secretary of State for that state, which is where the ultimate election results end up. But the real election results don’t show up for several months. The primaries that we’re watching right now, those are largely unofficial. They’re a project of the state’s party.
So the AP really becomes the de facto standard for a lot of this data especially in real time. We pay them for access to that data. Up until 2014, that data was provided through a FTP dump, a series of oddly delimited files in an FTP folder that you had to know how to find the correct file and load it up. They would update it every couple of minutes, every five to seven minutes or so. You could hit it once every minute looking for an update.
Well, in 2014 the AP released an HTTP API. For the 2014 general election, we didn’t use it. But for 2016, we decided we wanted to, because it’s a little faster. It gets the data to us as soon as three minutes, and race calls are instantaneous. An AP editor will hit a button to call a race for Bernie Sanders, and it will almost immediately show up in the API. So we want that speed more than we want almost anything.
That meant that we had to rebuild our election rig this year. It’s an old app, actually I think the oldest continuously updated application in our portfolio. It’s a 2006 era Ruby on Rails app that is not modular. It’s a very large app that parses all the data, loads it into a database, does sanity checks, does difference calculation, bakes out HTML and JSON of all the pages. I think it was like 200 total files/URLs for every election, which is a lot of things for a single application to do.
This year, we decided that we were going to break that down into a series of more modular little pieces, which was very exciting to make a big change like that in advance of such a big election cycle. We decided that that was really important. It would also give us a chance to rewrite it in Python and some old magic style Bash scripts, make it a little easier for us to maintain, and make it a lot easier for other people on our team to be involved in it as well.
Stephen: Yeah. That’s a great segue. You mentioned Python. You mentioned Bash. What are the technologies The Times uses? What’s the stack look like?
This year we’ve decided that long polling for results is sort of an anti-pattern. It’s slow, especially on mobile devices, to have to pull down a ton of JSON every 30 seconds. This year, we’ve been using WebSockets. We’ll open a WebSocket connection when the client connects. He’ll get the initial data from the page, then we can push you fractional updates whenever an update comes through. Because you’re not polling, the browser doesn’t have to do nearly as much. We don’t rewrite the DOM as often. The result is it feels a lot better. A user can scroll up and down, and that doesn’t feel laggy. The updates are really small, so even over a slow connection they work pretty well.
Stephen: Do you have issues with a browser in terms of their relative support levels for WebSockets?
Jeremy: Oh my, yes, although, truthfully, most…We did not use WebSockets in 2014 for this reason. We tried. Asterisk: we wrote a library that will fall back to long polling in the event that your browser doesn’t support WebSockets. As a result, it’ll just be a little slower, a little laggier. In 2014, I think we defaulted to long polling in a lot of cases. This year, almost an overwhelming number of the clients that have visited us have been using the Sockets. It’s better for everybody that way. It’s better for us, because we’re moving less data over the wire. It’s better for the client. That’s actually been a really good thing this year.
Stephen: Okay. Let’s fast forward to election night. This could be primaries or general. Without getting into sort of raw numbers, which I’m sure The Times would prefer that you keep to yourself, what kinds of spikes in traffic do you expect to see? In other words, versus a normal day. Is it 2X, is it 10X? What are you looking at in terms of the relative demand?
Jeremy: I can tell you that on an election night our results page is like having a second home page. We’re getting as much traffic to that results page on average as our home page does even on that night, which is already a multiple of a normal night.
I can tell you a pair of anecdotes that I think you’ll find amusing. The general election in 2012 set traffic records that we didn’t break until 2014. One of the things that happened in 2012 is that we, as a result of a cache misconfiguration, pointed the fire hose of the home page, plus all the people looking at our results pages at an unwarmed Amazon elastic load balancer, which immediately crumpled under the load. It was the first time I’d ever heard of that happening. I didn’t even know that that was something that could happen.
Stephen: That’s funny.
Jeremy: This year, we got a phone call and an email from Amazon, because we had done a very similar thing. We’d pointed about 14,000 requests a second at an Amazon S3 bucket that had not previously taken any traffic. As a result, we were returning about 1 in 5, 1 in 6 pages were a 500, something I’d never seen before. So we got a nice phone call from them about that as well.
Stephen: There you go.
Jeremy: So we’ve gotten to know our Amazon service reps, so it’s been nice.
Stephen: I was going to say. Amazon must be crucial just in terms of being able to spin up and spin down, depending on the load.
Jeremy: Yeah. There’s actually a handful of confounding factors about a general election that make programming against it a little difficult. We have geographic issues going on, right? We can’t not serve election results, just because there’s an outage in U.S. East 1. So we have to maintain duplicate versions of our whole infrastructure, our whole stack in East and in a West zone, availability zone. We have some issues with scale, which I kind of alluded to. It’s just thousands and thousands and thousands of requests per second on just a normal primary night. For the general, we’re pretty much expecting to set traffic records that we may not break for four more years.
There are a large number of contributors to our software. I’m working on sort of the core election stuff. But we also have three or four developers on our graphics desk who are working on a Node app that handles all the HTML and maps and stuff like that. We have two or three other coworkers of mine who are working on sort of back end pieces, and then a handful like site engineers. It’s like when you’ve got 10 or 12 people all contributing code to the same thing. That’s a confounding factor that you almost never run into on a normal project, especially a small one like this.
One thing that’s particularly hard about the election like this is that we have the staff like The Upshot, or we have other smart people inside the company who would like to do something new this year, or they want to do something they haven’t done before. A great example of this would be those Upshot live models. In previous years, we would have had to write special software to get them a route that would produce the data that they need. Then we would have had to help them scale their app up. It really would have been very difficult to do what we’re doing this year in a previous year.
Because of the way we set this up, very modularly, The Upshot has access to all the same election data that everybody else has. So they can test. They have access data to do tests on. As a result, they can build these live models, they can deploy them, and it just runs. No one has any questions. It makes it a lot easier to do things that you might say are “innovative” or things that are just new, things that are different to us and that normally we would have had to put a lot of development time into.
Stephen: Yeah, indeed. In terms of preparing for a major event, whether it’s Super Tuesday or a general or what have you, what are the kinds of things that you preemptively do, either on the people side or the hardware side? Obviously, you pre-warm hardware. Do you work at shifts? What can you tell me about the process rolling up to an election?
Jeremy: The one really great thing that I enjoy is that, because it’s a newspaper, we’re already people who are used to the idea that we’re all hands on deck for breaking news. We put together a little war room of the core folks that need to be around.
A great example, of this would have been on Super Tuesday. Big election night, so we have two or three of the core committers from the graphics desk who work on the maps and who built the charts sitting in the room. We’ve got me, who handles the election data in the room. We’ve got two editors in there. We have a politics editor available to make race calls as necessary. A Masthead editor occasionally drops by. It’s nice to have everybody in one place. That actually solves most of the problems that we have.
Stephen: Interesting. Okay.
Jeremy: I’m sure it won’t shock you that most of the problems that we have are not technological in nature, but human, right?
Stephen: Yeah, of course.
Jeremy: We’ll often have this case where something weird will happen, or it’ll look weird, but it’s actually completely legitimate, like the AP may call a race, and then we may not have any votes that come in for 30 minutes. It looks weird, but it’s totally legitimate. The AP can call. They have models for race calls that involve things like exit polls, votes that they have access to that haven’t been assigned to precincts. They can know in advance with some regularity that this is going to be a race that X candidate wins.
So they’ll call it, but we won’t have any votes in the system for a good 20 or 30 minutes while the secretaries of state prepare to deliver the first batch. It’s nice to have everybody in the same room so we can calm everyone down and let folks know this is totally legit, nothing’s broken, and this is correct. So it’s good. Other than that, we get a lot of sleep the day before.
Stephen: That’s right, yeah.
Jeremy: Try not to deploy anything new. We test like hell beforehand. That’s one thing I can say I really enjoy. My editor’s given me months in advance of these primaries to write software and then write tests and then write an entire rig for simulating an election that we can run our software as if we were in the middle of an election night. We went so far as to build in something that will simulate 500s or 403s from the AP, like that they’re down or that we’ve run out of API requests.
Stephen: Okay, so kind of like a chaos monkey for The Times?
Jeremy: You got it, exactly. Because we need to know that if something really horrible happens, we’ll be able to stay up and continue providing results.
Stephen: Right. What is a typical shift then? In other words, when do you come in that day? What time do people leave?
Jeremy: On a normal election night, I’ll usually take a train up to New York the day before and work like a half day on a Monday. Those elections are usually Tuesdays, the big ones. Tuesday morning, I’ll get in around 10:00 or so. The AP will push what they call live zeros around 11:00. This is to say they’ll have all the reporting units. This is the geographical areas that they expect to have results from in and with zeros as their vote totals. This lets us warm up the database with all the candidate information, all the names of all the counties that we expect to see, and it gives us an opportunity to go and edit names. Times style would be Donald J. Trump instead of Donald Trump, for example. So we have a handful of overrides like that that we need to do.
Between 11:00 and noon, we’re in loading that initialization data and baking out our first result pages that are all empty. Then we basically go eat lunch, and then all get back together about 5:00. First results usually start coming in somewhere between 6:00 p.m. and 9:00 p.m. Then it’s basically just all hands on deck until about 1:00 a.m. when the last handful of results come in, sometimes even later, if it’s like Hawaii or Alaska. But yeah, that’s what the night looks like.
Really, the big push is for the next morning when we have to write the day after story. We’ll have a politics editor asking us what county did Donald Trump perform the best in Massachusetts or where did Hillary outperform her 2008 in South Carolina. We go pull data for that. It’s nice. Actually, I really enjoy the day after almost more than I enjoy actual election nights.
Stephen: That’s funny. Well, actually, that kind of does make sense. What does ‘The Times do? Do they cater for you? Do they bring food in?
Jeremy: Oh, yeah, absolutely. As a matter of fact, one of The Wall Street Journal reporters, Byron Tau, makes a note of tweeting what everybody is having for dinner that night. He’ll have CNN, The Journal, the National Journal. The Times is pretty good. We normally have some combination of Szechuan or Thai. On Tuesday I was in New York for the New York primary. Of course, we had deli food. It was wonderful.
Stephen: Nice, nice, there you go. In terms of just wrapping things up then, for folks in the audience that might want to follow in your footsteps who are technologist and might want to sort of get into the news side or the politics side or the election side, what are the suggestions that you would have? How did you…I mean, obviously, we talked a little bit about how you got into it. What would your recommendations be for somebody who’s just breaking in?
Jeremy: I would say if you’d like to follow directly in my footsteps, you should be a failure at the first two things that you try, and then fall back to programming as your third shot at it.
Stephen: There you go.
Jeremy: I was a political science student at first and was going to teach, but my grades were terrible. I took the LSAT, because I thought I wanted to be a lawyer, and then did poorly on the LSAT. Then, in a fit of displeasure, I took a job working late night tech support at the St. Petersburg Times and just got started.
I’d say really the best thing is to listen for people’s problems. Almost all of the best software I’ve written has come from talking to a reporter and hearing a problem that a reporter has on their beat or somewhere in the news gathering process. We’ll have a researcher who will say,”Man, I just know that there are these people who die in Afghanistan and Iraq,” this happened at The Washington Post. There’s folks that die in Afghanistan and Iraq, and we get a fax every week with the names of every service member who dies.
But it’s really hard for us to answer any questions like, “How many people died in Afghanistan and Iraq this week?” Because we’re not sitting down and writing those together. You can do something as simple as set up a spreadsheet for someone or a little simple CRUD admin. It’s little problems like that that often turn into really cool stories, eventually.
I also say that your first project doesn’t have to be a big, award-winning, amazing data thing. There are lots of really easy low hanging fruit. I think Chicago Crime is a great example of that, because it wasn’t necessarily, on its face, supposed to be a journalistic enterprise. It was just a good civic data project. It was just as a citizen you need to know about this.
I feel like some of our best recruits have come from the civic data world, people who are just personally interested in the workings of the government or in our society and worked on a data project around that. Those people almost always have got the same sort of built-in incentives and ethics that we’re looking for here in the Fourth Estate.
Stephen: Yeah. In other words, what you’re saying, to some degree then, is that you’re not just looking for the technical skill set. You’re looking for somebody who is really able to work, whether it’s a reporter, whether it’s somebody sort of in a civic situation, but is able to listen and translate that into an actual requirement, as opposed to, “Hey, here’s some data. Let me show you something fancy I can do with it.
Jeremy: Yeah, absolutely. Truth be told, so many of the projects that we work on, you would think of them as boring technologically. It’s not as much fun. We’re not moving around millions of rows of data, although some of our projects, we get lucky and get to do things like that. A lot of it is just solving what I would consider to be fairly simple technical problems but that take a lot of empathy to figure out that they even exist.
Yeah, there’s a world of easy low-hanging civic fruit to get started on if you’re really interested in this sort of thing.
Anybody can be a journalist, man. You can be a journalist if you want to keep track of what’s happening to airplane tail numbers, and you want to see what that plane is that keeps flying over your house. This is like a great story. One of these civic data groups was watching tail numbers and figured out that there are lots of fixed wing aircraft flying over the city that were all rented out by the same company in Virginia. It was really weird. It turns out that company is owned by the FBI.
Stephen: There you go, yeah.
Jeremy: This is where good stories come from, right, is observation and tracking.
Stephen: There really is so much data. It’s just a matter of having, in many cases, the desire, I guess, or intent to put it to work, in other words, something stupid that I end up doing every year. There’s no tide charts, right?
Jeremy: Oh, yeah.
Stephen: We live on a river that opens up into the ocean about a mile and a half down. There’s no tide chart. It turns out that if you hit…what is it? I think it’s NOAA. NOAA has all that information, but it’s in some terrible format. [Ed note: credit for this process belongs to Jeff Inglis, who tipped me to it.] You pull it down. It’s not that hard to translate it into an ICal, and all of a sudden, it’s in your Google Calendar. Great, I know when the tide’s coming in. I know when the tide’s going out. Does that require any special technical skills? Not at all, but it is the kind of thing that you can take data in one format and make it useful to people. Yeah, I can definitely see that.
In terms of working on elections, all the work that you’ve done that have gone into all the elections that we talked about, what, for you, is the most rewarding aspect? What do you enjoy most about the work you do?
Jeremy: Oh, man, I think far and away it’s getting data to the smart people who work on our graphics desk or at The Upshot so that they can put together some killer analysis that nobody else has got. I loved looking at the big precinct maps that we had for New York yesterday or when The Upshot does their live model. Those are things that would be really hard to pull off unless you have got your election software down to not having to think about it at all. Because there’s so much plumbing and so much manning the bellows that has to be done in order to keep your election rig up and going.
The thing I feel like is that if you can apply just a little bit of technical rigor, maybe not too much, but just a little bit, or maybe some engineering principles to this, then it’ll give you the free time to work on the cool and awesome projects that you’d like to be doing in the first place. I’d have to say, that’s by far the most rewarding thing. It’s like getting all the low-end plumbing stuff out of the way so we can do really cool stuff in the time that we saved.
Stephen: Yeah, no, I’m sure. All right, last and most important question on Hark. What animal are you most frightened of?
Jeremy: Yep, absolutely. They’re small. They can give you brain disease. You may never even know you have it. It’s really hard to kill them.
Stephen: That’s fair.
Stephen: All right, with that, I just want to say thanks, Jeremy. This has been fantastic. We’ll hope to get you back on soon.
Jeremy: Thanks, Stephen.
Stephen: All right, cheers.