Introducing Project Arcturus, Part 2: The Infrastructure Behind the Curtain

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Thanks are in order, before we begin. The response to RedMonk Analytics has easily exceeded my expectations. Thousands of you read our launch post, and dozens of you have reached out to express interest, schedule demos, or just say “exciting, or “very promising,” or, my favorite, “great idea.” The folks at ReadWriteWeb, meanwhile, were kind enough to interview us on the project and its place in the landscape.

As we said Monday, we’re excited to launch the product, but we’re even more excited about where we’re going – not least because of some of the organizations that have contacted us since Monday’s news. But we’ll have more on all of that in time. For today, I wanted to take a few minutes to explain how we built RedMonk Analytics and why we built it the way we did. And for those of you who were upset that I didn’t launch the product with a Q&A – and you know who you are – this one’s for you.

Q: For those that haven’t seen RedMonk Analytics yet, how about a few questions: can you describe the product quickly?
A: Sure. RedMonk Analytics is a subscription based SaaS tool that mines our content and various third party data sources for patterns, trends and other insights about developer behaviors.

Q: How does it work, conceptually?
A: The basic premise is straightforward: we write content aimed at developers, where developers also includes sysadmins, DBAs, architects, desginers, and so on. Developers read and search for this content, generating some metrics, occasionally commenting or linking to it externally, generating other metrics. We capture these metrics, clean them up a bit, analyze them, and serve up the result to customers. In the very near future, we will be layering in additional developer related data sources from a variety of third parties (and as a reminder, please do contact us if you’re interested).

Q: Who is the intended user? Do I need to be an analytics specialist?
A: Not at all. The system was built for non-technical users, and is oriented not around data, but questions. Rather than presenting you with charts, we start you with a dashboard that tries to help answer simple questions about what developers want and who they are, using our data. No statistics or compsci degrees required.

Q: What subjects does the system address?
A: Because it’s based off our data, the system addresses anything we cover. From application development to big data to cloud to mobile to IT management to open source and so on, it’s all in there. Want your reports to be based off of custom keywords, like Linux, for example? Drop us a line.

Q: What is the system built on, from a front end perspective?
A: Update: My original understanding was that the web front end for RedMonk Analytics was built on top of WordPress. Turns out we actually bypassed it, given our minimal needs. The application front end is actually based on CodeIgniter and Crowd Favorite’s Oxygen framework. WordPress is still core to the application, however.

Q: Why WordPress?
A: WordPress has been the foundation of our business for years, with both our blogs – the source of much of our data – and our homepage, WordPress based. We’ve been able to extend WordPress in some interesting ways, and the result powers our product.

Q: How did you extend WordPress?
A: We’re using a custom WordPress plugin written for us by Crowd Favorite to feed the system. This was important because it allowed us to link the sites to the application easily, but also because it will permit us to grow the system in future to other WordPress properties.

Q: What’s the infrastructure that powers RedMonk Analytics?
A: Our software stack – the custom WordPress implementation aside – is fairly generic. We’re using a stock Ubuntu image, MySQL, and of course Apache and PHP.

Q: Why LAMP generally, and why Ubuntu specifically?
A: We’re using LAMP for the same reasons most people use LAMP: the cost and friction of acquisition is zero, the support – both commercial and non – is excellent, it’s something we know fairly intimately after all of these years, but most importantly: it just works. Most of the time, anyway. As for Ubuntu specifically, I was given the option during the construction process to pick my distribution, and I selected Ubuntu in part because of its volume acceptance in cloud settings but more because it’s the distribution I run personally. There were lots of other viable choices, but if or when I need to work on the system, I’m most familiar with Ubuntu. What’s interesting to consider is whether or not I would have had the same choice, say, three years ago: most web shops at that point were building off of Fedora, CentOS, or RHEL. That I was offered Ubuntu here is something of a change.

Q: Where are you hosting the application?
Q: RedMonk Analytics is currently run off of a single large EC2 instance at Amazon.

Q: Why Amazon?
A: Well, everyone else was doing it…I kid, I kid.

Anyway, several reasons. First, a cloud or at least managed hosting provider was a given for me, because RedMonk is at the stage in our lifecycle where I need for the operational responsibilities for the infrastructure to be someone else’s problem. Much as I love tinkering with machines, I absolutely do not have time to play sysadmin. Because we’ve never built anything like this before, as well, we need to have the ability to flexibly and elastically add – or subtract – both compute power and storage. Meaning that cloud infrastructure was clearly in order here.

With that understanding, we looked at our options and Amazon’s combinating of pricing, tooling and yes, popularity, made them a winner. There are a variety of cloud infrastructure providers, of course, but not too many of them have an Ubuntu integration like this. It’s all about barriers to entry, remember.

Q: Are you building your own data warehouse?
A: Yes and no. We are storing data locally for performance reasons, among others, and we do therefore have a portion of the local infrastructure devoted to storage. A lot of the heavy lifting at present, however, is outsourced to Google Analytics. We’re calling out to it for substantial portions of the data displayed in the tool. Our future plans, however, will require more local data, so we’ll be fleshing out the storage and warehouse like elements of the system moving forward.

Q: How are you handling backup?
A: We’re using BackupMoxie to snapshot the instance in the event of failure.

Q: Any last thoughts?
A: Just that we’re grateful for the response we’ve gotten thus far, and hugely appreciative of the time and consideration the project’s been given. We look forward to continuously improving the system in the months ahead, and we’ll try and keep you posted here on our approach with respect to infrastructure. Both as a means of documenting our decision process for the benefit of others, but also in the event that some of you spot mistakes that we’re making.


  1. Good luck; interesting that your web traffic is enough to give you worthwhile results!

  2. […] Introducing Project Arcturus, Part 2: The Infrastructure Behind the Curtain […]

Leave a Reply

Your email address will not be published. Required fields are marked *