When John Paulson bet against the real estate markets, he knew something that other people didn’t. By applying his models against purchased real estate databases, he perceived an opportunity where others saw folly. Fifteen billion in profit later, people are understandably a bit curious as to how he pulled it off. Gladwell’s explanation (subscription required) focuses on the man. Personally, I’m more interested in the technology, because I think it’s probable that we’re going to see a lot more Paulsons in the days ahead.
Similarly outsized profits will, presumably, still be rare, what with the number of people entrusted to bet hundreds of millions of dollars of other people’s money not likely to ever be large. But between the accelerating democratization of the tools of large scale data processing, the dependent trend towards of ever more frictionless access to data and dramatically lower compute costs we’re going see a lot more profit from data derived intelligence. In other words, I’d bet long on enterprises whose primary product is analytics driven insight.
We often speak of data as if it is the ends rather than the means. But the raw data often isn’t much help. We need intelligence, by which I mean insights derived from the data source. Or sources. To make data usable, we need to make sense of it. Which means sorting it, visualizing it, comparing it to predictive or historical models, and – increasingly – recombining it with other data.
The Boston Red Sox subscribe to a private weather forecasting service, Meteorlogix. Why? When there is so much free weather data available, why would the Red Sox pay a private service? Presumably because they feel that the private firm offers something the public services do not, and as a weather sensitive business, the economic impact of even marginally better forecasting could be material.
Businesses that monetize data aren’t new, of course. What’s different today are the capital costs. Large scale data processing software can be obtained for free. The same is often true for operational data. Storage and compute costs, meanwhile, are now both pay-as-you-go and accessible even to the smallest businesses, thanks in part to the cloud.
Consider the case of Flightcaster. After initially dismissing them as little more than a yet-to-be-acquired feature of a TripIt, I’m beginning to wonder whether or not I’ve got it the wrong way round. TripIt’s proven that there’s money in optimizing the travel schedule of individual consumers. But isn’t it possible that Flightcaster will eventually be able to extract significantly more revenue on higher margins from airlines for helping optimize their operations?
Overall — 87% of flights had time added to their scheduled between 1996 and 2009, while only 80% experienced longer actual elapsed times. Meanwhile, 10% had time subtracted from their schedules, but 16% of flights were faster in actuality. So airlines were certainly over-compensating in 2009.
Motives? Like Scott says, they are many fold: Better operations overall, better on-time performance, better ability to plan.
It’s a game airlines play to balance their operational needs and customer service. Sometimes they win, sometimes they lose. But predictability of delays is the biggest lever to help them play this game. Over time, we hope to use FlightCaster data to help with these kinds of decisions as we gather more data and analyze it in different ways.
Emphasis mine. How did Flightcaster, a one time Y Combinator startup, put itself in a position to know more about the state of airline operations than the airlines themselves? By building themselves a highly differentiated dataset amalgamated from sources like the Bureau of Transportation Statistics, FAA Air Traffic Control System Command Center, FlightStats and the National Weather Service. From an interview with their head of research, Bradford Cross:
The public data set that we use is the “on-time database” published by the FAA. The data set is tricky to get all in one place since the FAA does not provide any decent API to it. The biggest issue is that we make real time predictions, so we needed a historical set of captured real time data, which we had to create ourselves.
Having a more amalgamated real time dataset going back historically for a decade would be a big help. Having more modernized ways of accessing the data would be helpful.
Until then, if anyone wants to buy it, we will sell it to them for a very high price.
Is Flightcaster’s iPhone app the important product for the firm, then? I doubt it. It’s useful as a marketing tool, I’m sure, but ultimately the value of the firm lies in their data. By combining public datasets, Flightcaster can answer the easy questions – who is the most delayed airline? the most delayed airports? – as well as more complicated analyses such as “how our political system is causing flight delays,” “whether or not winter weather is causing delays” and so on. Like Google, their real value is underappreciated, because it’s a product that is indirectly monetized.
How many businesses like Flightcaster are poised to emerge over the next few years, with data easier to get, the tools to work on it cheaper, and the financial incentives better understood? Tough to say. But it’s safe to assume that there are thousands of similar asymmetries between publicly available data and the intelligence it contains yet to be discovered.
Which is why it doesn’t take much of a model to predict more of them.