In the true RedMonk spirit, you heard it here last. I attended O’Reilly’s Strata conference earlier this spring and have finally gotten around to summing it up.
If Steve Ballmer had gone to Strata this spring, he would’ve screamed: “Analytics, analytics, analytics!” I was inundated with the term every time I turned around. Whether it was in talks, briefings, or the hallway track, analytics was clearly the order of the day. In fact, it seemed to drown many other areas like databases — I talked with more database companies than analytics companies, but hardly anyone was mentioning them outside of my briefings.
From my point of view, this Strata (the second, and my first) segmented into a few types of content and thus audience: analytics, Hadoop newbies, and “data scientists” — for some definition of the term. Attendance nearly doubled since the previous Strata, to around 2300 people (from 1400), a strong indicator of burgeoning interest in data and what to do with it. Many two-time attendees told me that this iteration of Strata had grown much more business-oriented, as tends to happen to conferences over time. (Sidenote: Just another piece of evidence for bottom-up, developer-led adoption.)
I started the conference in the “Deep Data” day, intended as a deep technical dive for data scientists that ran in parallel with a business-oriented “Jumpstart” track (marketed as the missing MBA for Big Data). Although Deep Data was by far the most in-depth series of talks I saw at Strata, I felt that it could, and should, have gone yet deeper. Three of the talks were outstanding in their technical detail, enough so that I thought I learned enough to understand and replicate the speakers’ approaches:
- From knowing “what” to understanding “why,” by Claudia Perlich of M6D
- Corpus bootstrapping with NLTK, by Jacob Perkins of Weotta
- The importance of importance: an introduction to feature selection, by Ben Gimpert of Altos Research
I won’t go into depth here because most of the content was pretty complex, but I found it quite valuable. Many of the other talks were interesting as well, either from a philosophical perspective (for the higher-level talks) or as additional support for my worldview and methods.
The undeniable highlight of day 1, however, was the Data Science Debate (video) — the great rarity of a panel that not only didn’t suck but was completely enthralling. Wonderfully moderated and summarized by Mike Driscoll, the participants were ordered to debate whether the first data-science hire of a new startup should be a domain expert or a machine-learning expert, with three panelists assigned to each side. I don’t want to give away the final vote, but I will say it was a very close call, and this was from an audience of experienced data scientists.
I spent most of days 2 and 3 in briefings; it was a pretty even split between analytics and database companies, with a sprinkling of Hadoop, data marketplaces, etc.
On the analytics front, what I noticed at Strata and have seen more of since is that companies are trying to bring analytics to the masses. In general, it seems that analytics of past and present data is targeted at non-technical users throughout a business, whereas predictive analytics is primarily targeted at expert users who might be on an analytics team. You can tell this not only from the types of choices needed to use the software (am I choosing algorithms or colors?) but also from the quality and polish of the interfaces.
In the past few months what we’ve begun to see more of is companies like SAP, IBM and SAS attempting to bring true predictive analytics to non-technical users. This is another reflection of the gradual technology lifecycle shifting from technical experts to business roles as the technology itself matures and we also discover and invest the time to create robust, user-friendly wrappers around it.
On a different note, the common theme between database and Hadoop companies I spoke with was the gradual merging of their roles, for many use cases. Databases are growing more and more capable of storing all types of data as former SQL databases repurpose themselves as universal data stores capable of handling Big Data, fast. Simultaneously, others are developing technology to access data from Hadoop and similar stores more quickly and in more standardized (read: SQL-like) ways.
The final point from Strata I’d like to touch on is data marketplaces. Infochimps started out a few years back, but announced at Strata that it was essentially pivoting into an infrastructure provider for Big Data solutions. Companies like DataSift and DataMarket, in addition to Microsoft with its Azure Marketplace, have more recently begun their forays into the data-marketplace space (which my colleague Stephen has written about many times).
As is typical in many markets, timing is everything, and companies entering the market too early can burn out before potential customers realize it’s a true need. It remains to be seen whether we’ve reached the right time yet. So far my impression is that we’ve reached the right time to sell insights based on data, like what companies including Sonatype and New Relic are doing, but valuation for data itself is still difficult — convincing people of its value even more so.
Disclosure: Microsoft, IBM, SAP, Sonatype, and New Relic are clients; SAS, DataSift, and DataMarket are not.
At Strata, “hardcore” data science is pretty fluffy – Donnie Berkholz's Story of Data says:
February 18, 2014 at 5:18 pm
[…] expectations for the “Hardcore Data Science” track (“Deep Data” in 2012[writeup]), which is framed essentially as continuing education for professional data scientists. […]