Donnie Berkholz's Story of Data

The littlest Monk

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Hello, world! I’m thrilled to be joining RedMonk as its newest analyst. Since I could never fill Coté‘s shoes, thankfully I don’t need to; I brought some shoes of my own because RedMonk’s going in exciting new directions. I figured I’d kick off this blog by introducing myself, why I’m here, and some (but not all!) of my current interests as an analyst.

Who am I, and how did I end up doing this?

James and Steve have already provided ample (and over-the-top!) introductions. So you can understand the background I’m bringing as an analyst, I’ll fill in some of the blanks and the backstory. To sum things up, I’m a scientist and an open-source developer. I started purely in science, until one day in 2001 when I had to learn Linux to do simulations of a neurotransmitter called serotonin (PDF summary), which I was already studying with lasers. I got addicted to Linux, and from there it was just a matter of time — no matter how cool it was to work with lasers.

Lasers! (No sharks, though)
Turns out that shooting really powerful lasers at serotonin makes it glow purple. (Don't try this at home, kids.)

On the science front, I eventually jumped ship from lasers to X-rays. I worked on understanding the relationships between a protein‘s structure and its function using a number of methods, but the coolest one is called X-ray crystallography. Soon my love of computers became apparent, as I switched gears from doing my own lab work to deriving new insights from large-scale studies of data that already existed (see the pic below), in the form of protein structures solved by others. We then made the tools we created freely available so everyone could benefit. This was the equivalent of so-called “data science” in the world of biochemistry.This graph from my work happens to be about proteins, but it could just as well be any multi-variable dataset:

Using a technique called kernel regressions, we can create nice smooth trends even when the underlying data are pretty noisy. This graph uses color to display the occurrence of different types of protein structure (defined by Φ and Ψ).

My craving to make more of a real-world impact drove me to the Mayo Clinic, where I worked in early-stage drug discovery and continued building my expertise in dealing with large quantities of data. Our favorite technique was called fragment-based screening. We would start computational screening with hundreds of thousands or even millions of compounds, only to narrow it down to ~50 we wanted to test in the lab.

Discovering my passion for technology

At the same time, I became deeply involved in using and programming open-source software (OSS), first just in my free time but later incorporating it into my research. By 2003, I’d settled on Gentoo Linux as my distribution of choice (by way of Red Hat and FreeBSD), and I jumped in with both feet. I started using Gentoo in March, and by June I earned my developer privileges. At the time, I could hardly hack my way out of a paper bag because I lacked any training in computer science. I think I spent most of that summer working on a single package. Fast forward a few years, and you’ll get an idea of how slow of a start that truly was — by 2005, I’d taught myself enough to maintain a few hundred packages, and that was far from the only thing I was doing in Gentoo.

I soon became a leader in Gentoo, first as manager of its desktop project and later as one of the 7 members of its elected council, where I’m serving again after a brief hiatus to focus on science. Gentoo has 200–250 open-source contributors, so getting them all on the same page is no mean feat. A few summers back, I also took over Gentoo’s involvement in the Google Summer of Code (GSoC), an amazing program run by Google that pays college students to work on open-source projects. In GSoC, I oversee ~15 student-mentor pairs; basically, I train mentors, make sure things run smoothly, recruit students to become Gentoo developers, and put fires out.

I write, too!

Outside of science and OSS, my most relevant contribution is probably as a guest author for — one of the best developer-targeted sources for open-source news. I was classically trained in journalism in college and spent 4 years working at various newspapers both at the college and professional levels, doing everything from writing to page design to copy editing. Having an opportunity to finally apply this training to my love for open-source development was an incredible stroke of fortune.

Convergence, and joining RedMonk

These three seemingly disparate threads of science, OSS, and journalism have grown increasingly interlinked over the years. At first, I’d wondered how I could possibly choose between my three loves, but somehow things started coming together in a way I’d almost describe as destiny. First it was science and writing, then I started bringing OSS code into my science, and finally I integrated writing into my OSS work with Gentoo and LWN.

Ever since I first discovered Steve 6 or 7 years ago as he was writing about Gentoo, I’ve thought he had the greatest job ever. I’ve closely followed his work over the years, because a lot of it applies quantitative methods to understand trends in technology and how they’re driven by developers, from the bottom up. As a scientist by training, I love quantitation so it was great to see similar methods applied to the IT industry instead of the usual hand-waving. When the opportunity opened up to join RedMonk, I couldn’t resist — between James’ and Steve’s desire to bring on a “data griot,” my admiration for their previous work, and my own diverse background in everything they wanted, it all fit together like the pieces of a jigsaw puzzle.

The hardest part? Making the decision to leave the life sciences behind, even though I love tech and the new directions I’m moving. Between the sheer investment of time and the difficulty of breaking back into academic science once you’ve left, it was nervewracking. But both on the macro- and micro-scales, every other factor made the decision easy — feel free to drop me a line if you’re in the same position. Once I applied, everything went smoothly.

I’ve always felt that the best way to get a job or promotion is to be doing the work already, and James agreed:

But even during the hiring process I was blown away by the fact Donnie just dovetailed with us. If I wrote a post Donnie commented. If Stephen tweeted about some data, Donnie explained how to normalise it. He friended me on every social network I use, and engaged. More than any other candidate Donnie just became a natural part of the team… before we even made the final decision to hire him.

I’m thrilled to be here and am really looking forward to interacting with all of you. In my next post, I’ll discuss my interests as an analyst.



  1. […] promised, here’s some of the things I want to focus on. My interests are all over the map, so this is […]

  2. […] starting at RedMonk in 2011, I’ve deeply enjoyed interacting with all of you. It was truly a dream come true — […]

Leave a Reply

Your email address will not be published. Required fields are marked *