Blogs

RedMonk

Skip to content

The Luckiest Day

self portrait(s)

Me and my wife been married thirty-one years. No children. We lost a girl but I wont talk about that. I served two terms and then we moved to Denton Texas. Jack used to say that bein sheriff was one of the best jobs you could have and bein a ex-sheriff one of the worst. Maybe lots of things is like that. We stayed gone and stayed gone. I done different things. Was a detective on the railroad for a while. By that time my wife wasnt all that sure about us coming back here. About me runnin. But she seen I wanted to so that’s what we done. She’s a better person than me, which I will admit to anyone that cares to listen. Not that that’s sayin a whole lot. She’s a better person than anybody I know. Period.

People think they know what they want but they generally dont. Sometimes if they’re lucky they’ll get it anyways. Me I was always lucky. My whole life. I wouldnt be here otherwise. Scrapes I been in. But the day I seen her come out of Kerr’s Mercantile and cross the street and she passed me and I tipped my hat to her and got just almost a smile back, that was the luckiest.

People complain about the bad things that happen to em that they dont deserve but they seldom mention the good. About what they done to deserve them things. I dont recall that I ever give the good Lord all that much cause to smile on me. But he did.”
- Cormac McCarthy

I just got married.

by-nc-sa

Categories: Personal.

Oracle v Google: Why?

When Android debuted in 2007, I couldn’t figure out how Google had managed to apply an Apache license to the project. Java, like Linux, was governed by the GPL and thus incompatible with the more permissive license Android was sporting. Stefano Mazzocchi subsequently answered the Java related questions: Google wasn’t using Sun’s VM, they’d built their own. As had Danger before them, from whence many of the Android team arrived. Called Dalvik, Google’s cleanroom reimplementation was, if not “Sun’s worst nightmare” as Mazzocchi put it, a clear fork-in-the-eye to the Java license holders. However brave a face they put on it at the time.

Whether Google decided to reimplement the JVM for financial reasons, technical reasons, or both, is unclear. Whatever the motivation, Dalvik allowed Google to bypass Sun en route to market. What Dalvik never did – never could have done – was protect Google from patent litigation.

In estimating the risks of such action, Google could have reasonably assumed that the probability of Sun suing them was near zero. Sun may have been unhappy, and may even have suspected that Google’s cleanroom reimplementation was anything but. The Vegas line would still have been decidedly tilted against Sun turning to legal action.

Maybe Sun’s reluctance to sue was financial. I’m personally skeptical of this claim – companies with failing financial fortunes in my experience are generally more inclined to seek legal remedies to their problems, not less – but Shankland’s sources are always good. Even if this were the case, however, Google couldn’t have assumed that would suffice as a shield. Instead, the search giant would have expected Sun to behave consistently with its past behavior and future interests. Besides the fact that Sun had effectively zero history using its patents offensively, as James Gosling acknowledges, there was the fact that attacking Google over patents would irreparably damage Sun’s then nascent efforts to repair its fractured relationships with developers. It would be tough to reconcile a suit with the public positions of its chief executives. Worse, it would have injected fear, uncertainty and doubt into the dominant enterprise software ecosystem at a time when Sun could ill afford it. While its pursuit of patents remained high even then – for what former chief counsel for Sun Mike Dillon characterized as defensive reasons – offensive use of its intellectual property was more or less unheard of, the Microsoft settlement notwithstanding.

Sun, in other words, was not going to sue Google. And Google knew it.

It’s safe to assume, however, that Google also knew that Sun was unlikely to be the permanent owner of its intellectual property given the firm’s financial trajectory. Which is why Google’s legal team probably started preparing for a suit the day the Oracle transaction closed. If not earlier. And it also explains why Google was prepared enough to fire back with something more than a “no comment,” dismissing the claims as baseless and cleverly reframing this as not just an attack on Google, but on open source Java as well. Oracle shares neither Sun’s old qualms nor its conscience, and Google knew that too. Finances are not an obstacle, and Oracle does not care about perceptions of the company, developer or otherwise. After yesterday, it would appear that they are similarly unconcerned about what the ecosystem thinks of their stewardship of Java.

The latter point is perhaps the most important. It’s the only real clue we have to answer the only real question here: what does Oracle want?

Because the answer to that is: not what they’ve asked for in the complaint. Oracle may indeed request recompense for “the damages sustained and will sustain” as well as “any gains, profits, and advantages obtained by Google as a result of Google’s acts of infringement and Google’s use and publication of the copied materials.” But you can be sure that that’s not all they want.

As Andy Updegrove covers, the obvious motivation is financial. Specifically, maximizing the return on the six and a half billion capital expense that bought Sun’s assets, the patents in question included. If Oracle realized the same return as Sun from the Microsoft settlement concerning Java, for example, the cost of Sun becomes four billion. Remaking what was arguably a bargain into a steal.

Purely financial justifications for this suit are less than satisfying, however.

To begin with, Oracle would effectively be trading long term ecosystem health for a short term cash windfall. Unless the settlement is historically immense – a difficult outcome to rely on from a planning perspective – it’s not clear that this would be a net win. For all of its sustained success in the application and database markets, Oracle remains as fundamentally dependent on the Java ecosystem as Sun was before it. Even for a company that’s sought and found growth through stack ownership and category dominance, the health of the ecosystem is and must remain a concern. While the original technology was technically groundbreaking and differentiated, the key to Java’s success lay outside its featurelist. What drove its ascension within enterprises was the reality that Java offered at least the potential for independence from vendors. That will not be surrendered lightly, whatever Oracle may believe. A Java ecosystem dominated by Oracle is a doomed ecosystem. While it’s far from clear that this action by itself would create that perception amongst current Java ecosystem participants, it, coupled with Oracle’s own aggressive history, would be unlikely to be beneficial from a participation standpoint. As Andy put it, “it’s less clear to me what the strategic value would be to Oracle to prevent Google for incorporating Java into Android, or to impede the marketplace generally from relying on Java.”

Nor is the outcome of this suit certain. Granted, Oracle’s lawyers will be belligerent and numerous. Given the nature of both the patent system generally and the patents at issue specifically, it’s not only possible but likely that Android infringes in some fashion. This is perhaps not surprising. News that the patent system as it pertains to software is broken is not, in fact, news (coverage). It is also true that the fact that Oracle is proceeding in the face of obvious, substantial costs both financial and not suggests a level of confidence in the merits of their suit. What will be interesting to observe, however, is how the suit is or is not complicated by the post-Bilski patent landscape as well as the public promises of senior Sun executives not to assert their intellectual property rights in this fashion. Groklaw discusses the Bilski implications in their coverage. The Sun executive blog entries, meanwhile, cannot supercede inherent intellectual property rights, of course. But might they be used to build a case challenging the intent of the rights holders? Perhaps. Can patents once regarded to be “open” be “closed?” Barring a settlement, we’re likely to find out. If they cannot be closed, and Oracle’s suit fails, the costs, in both dollars and damage, will be exorbitant. Even before we get to the inevitable countersuit which will result from the asinine mutually assured destruction game the current system forces upon us.

It can be argued, then, that this is a high risk exercise for Oracle. The only satisfactory return for high risk exercise is high reward. Based on past software settlements, it’s difficult to project this being material to Oracle financially over a multi-year timeframe. Which is why I suspect there’s more at stake here than royalties.

What that is is non-obvious. All that we know about what Oracle wants, realistically, is what they are prepared to surrender. Aside from bearing the hard costs of litigation, Oracle is willing to absorb soft costs in risk to reputation and participation rates in the Java ecosystem. We must expect then that Oracle’s expected return will be commensurate with these costs. Oracle is many things, but stupid generally isn’t one of them.

Perhaps, as Forbes speculates, this is a prelude to a cross-licensing arrangement. Though if that’s the case, I’m far less certain that this suit actually has anything to do with Android; might patents like this “Large-scale data processing in a distributed and parallel processing enviornment” or this “Information extraction from a database” be relevant to Oracle’s core businesses? Perhaps Google is already or plans to compete directly with Oracle in ways we are not aware of yet. Or maybe Oracle just wants Google to buy a bunch of database licenses.

Whatever the real reason, this is a surprising decision even for a firm as aggressive as Oracle. The only thing more surprising is how quickly it turned Google – excoriated around the web for their questionable net neutrality proposal with Verizon – back into the good guys. Even if you speculate about differences in Oracle’s evaluations of its own assets – that Oracle believes that Java has peaked in popularity, for example, and that this is a one time opportunity to cash in on an asset that must, inevitably, decline – the calculus of this move fails. Nothing in Oracle’s product roadmaps hint at such a realization. Nor would a one time windfall, however large, be sufficient to offset the costs of a significant decline in Oracle’s Java related products.

As for predictions, I’ll make only one: whoever wins will also lose. This suit is going to negatively impact – probably substantially – Java adoption. The enterprise technology landscape is more fragmented by the day, as it transitions from .NET or Java othodoxy to multi-language heterogeneity. Oracle’s suit will accelerate this process as it introduces for the first time legal uncertainty around the Java platform. Apple and Microsoft will be thrilled by this development, and scores of competitive languages and platforms are likely to see improved traction as a result of Java defections.

Add up these costs, and the only supportable conclusion is that Oracle’s ambitions here are big.

Recommended reading:

Disclosure: Oracle has in the past been a RedMonk customer; Google has not. Of the other mentioned firms, Apple is not a RedMonk customer while Microsoft is.

by-nc-sa

Categories: patents.

Tags: , , ,

AWS: Forget the Revenue, Did You See the Margins?

revenue-slide

Two days ago, two analysts from UBS – Brian Pitz and Brian Fitzgerald – projected Amazon Web Services revenues at $500 million. Many were disappointed, expecting more from the widely acknowledged market leader: a half a billion dollars is approximately what Microsoft spent per datacenter pre-2010.

Those who would focus on the actual revenue figure, however, are likely to miss the more important margin numbers.

It has been long assumed that cloud computing – at least as currently practiced by the lower value add Infrastructure-as-a-Service practitioners – is a low margin business by enterprise infrastructure standards. Larger systems players such as HP and IBM have shown little appetite for the public cloud market in part because of this, depending on who you talk to. One senior executive I spoke with from a large systems vendor two years ago was blunt in his assessment of the prospects for a public cloud offering: “I don’t want to be in the hosting business.” The implication being that hosting offered insufficient margins.

Certainly nothing in Amazon’s pricing, either at launch through to today, has seemed to contradict this conventional wisdom. True, the actual cost of full-time EC2 instances exceeded competitive offerings from traditional hosts, but the dynamic consumption of AWS servers would presumably mitigate the moderate margin Amazon could realize. A half month of a $72 dollar server is worth less than a full month of a $40 server and so on.

Except that it apparently isn’t.

At the OSCON Cloud Summit, I delivered a presentation on cloud lockin. The concept as it relates to margins was simple: margins at the foundational layers were slimmer, which was spurring the development of various platform services which besides creating the potential for lock-in would theoretically provide higher margins. It’s standard technology value-add thinking: if I can charge $10 for a basic, bare bones server, I should be able charge $20 for a platform in which you don’t worry about servers, capacity planning and such any longer. That the market has largely rejected platform services in favor of more elemental infrastructure building blocks doesn’t change the basic economic assumptions being made.

My presentation was, I believe, generally well received. The reactions, both on Twitter and in person following my talk, were positive and question oriented. With one notable exception.

James Watters argued vigorously that I was underestimating, substantially, Amazon’s margins.

WOW @sogrady couldn’t have been more wrong when comparing the margins of HP to EC2; EC2 much higher than HP average 25%.

Partially the disconnect is that I hadn’t meant to imply a comparison of cloud margins to that of HP generally. The intent was rather to contrast typical cloud margins to enterprise technology businesses, where ideal margins generally begin at 40%. But as it turns out, James was right and I was wrong, irrespective of that framing error.

According to UBS, Amazon Web Services gross margins for the years 2006 through 2014 are 47%, 48%, 48%, 49%, 49%, 50%, 50.5%, 51%, 53%. Granted, this is an analyst projection. And the inherent risk of projecting four years out in a volatile market is acknowledged.

But even should we trim the figures liberally, the fact is that the margins that Amazon is realizing on basic infrastructure services are substantial. For context, look at the income statements for a Cisco, an IBM or an HP.

If this is true, most of what what we’ve believed about Amazon’s business – that it was in fact a high volume over low margin business – is wrong. And if that’s wrong, it changes the way we must evaluate the cloud industry and the attendant economic opportunities. Revenue is a function of volume and margin. The volume, with respect to the cloud, is not a concern for me. The margin always has been. If that concern can be erased through combinations of automation, efficiencies and scale, then the economics of the cloud look even brigher than they did before. The current market size may portend less upside that we’ve historically seen from technology sectors because it’s more significantly driven by volume than in years past, but I have few concerns about the market potential long term.

Which is good news for the industry as a whole, I think. Sometimes it’s good to be wrong.

by-nc-sa

Categories: Cloud, Economics.

Even with Big Data, It’s Hard to Ask the Right Question

The most significant findings of our preliminary review are: The U.S. Government had sufficient information prior to the attempted December 25 attack to have potentially disrupted the AQAP plot.

Though all of that information was available to all-source analysts at the CIA and the NCTC prior to the attempted attack, the dots were never connected, and as a result, the problem appears to be more about a component failure to ‘connect the dots,’ rather than a lack of information sharing. The information that was available to analysts, as is usually the case, was fragmentary and embedded in a large volume of other data
.”

- Summary of the White House Review of the December 25, 2009 Attempted Terrorist Attack, from Whitehouse.gov [PDF]

Assume for a moment that you had full, unconditional access to Google’s dataset: what would you ask of it? Google uses it predict the flu. What can you find? If you’re like most of us, you will lock up almost immediately. The paradox of choice overwhelms the rational mind with its virtually infinite possibilities.

The question of what to ask is important because the big data space, at present, is focused on data collection, storage and processing. Which is why things like Cloudera’s Flume catalog and S3 sink are (rightfully) the subject of intense interest. Not every problem in large scale data processing is solved. Far from it, in fact.

But as Twitter’s Kevin Weil eloquently put it during his OSCON talk, “asking the right question is hard.” Which is the best explanation of why people like Kevin are so important.

At the recent Hadoop Summit, the reported consensus was that the stuff that used to be hard – collecting, storing and working on large volumes of data – is getting if not easy, easier. Even for individuals, thanks in equal parts to cloud computing and open source software. A conclusion we subscribe to coverage). The challenges that remain, however, may prove to be even more formidable. Because as intelligence agency failures like the December 25 attack prove quite adequately, asking the right question is hard, no matter how much we spend on tools and infrastructure.

The success of Google and the other web firms has led to the central belief that more data is always better. And statistically speaking, that tends to be true, particularly when you’re making predictions, as models for inference perform better with higher volumes of data. As Google’s Chief Scientist Peter Norvig put it, “We don’t have better algorithms than anyone else. We just have more data.”

Higher data volumes is far from a universal positive, however. First, there’s the fact that data may offer diminishing returns. As the Chief Economist of Google, Hal Varian once observed:

There’s a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate.

The bigger problem is the volume of data itself. Even if you can process it analytically, it’s difficult to know what to look for. What to ask. How to ask it. And how to keep asking different variations of the question to get not the answer you expect, but the correct answer.

Consider a piece from this week’s USA Today, entitled “Methods for detecting test bias flawed, research suggests.” Wherein lies the flaw? The question, not the data.

“A major new research project — led by a scholar who favors standardized testing — has just concluded that the methods used by the College Board (and just about every other testing entity for either admissions or employment testing) are seriously flawed…

In the common approach, individual questions are analyzed. What the new paper suggests is another way to look for bias. The scholars created a database with literally trillions of questions and scores on a range of tests, including all the major standardized tests used in college admissions. And this database featured trillions of questions that had been determined to have bias. But when samples were pulled out for analysis of a given question on a given test, the results came back negative for bias.

The conclusion, Aguinis said, is that question-by-question analysis doesn’t detect bias.

“Given our research, the conclusion that tests are unbiased should be revisited,” he said. “We need a much bigger question.”

So it’s hard to attack data with the right questions. Got it. What can be done? If Facebook and Twitter are any indication, the answer is to resource for it. Here’s Jeff Hammerbacher, ex-Facebook, on the origins of the data team there:

Around 4Q05 they decided to establish a ‘reporting and analytics’ function that was more like traditional DW/BI, and to hire a ‘research scientist’ to do things like identify and evaluate algorithms for news feed ranking. They hired one person into each role. Unfortunately, the person hired into the latter role passed away due to a tragic biking accident. I was hired in 1Q06 with the same title (‘research scientist’), but my role quickly evolved into supporting the functions of the reporting and analytics group. Some time in 3Q06, Adam D’Angelo returned and we discussed changing the two groups to be focused on ‘Data Infrastructure’ and ‘Data Analytics’, and in 4Q06 (I think) we merged them into the ‘Data Team’…

Dustin Moskovitz can talk more about the motivations in 2005. From asking him directly while at Facebook, the goal seemed to revolve around 1) building a historical repository which could be queried offline without impacting the live site and 2) figuring out if changes made to the site impacted user behavior in a positive or negative fashion. For 2, the controversial change near the end of 05 was adding high school networks.

Of course not everybody has Facebook’s resources, but there are a variety of resources that can help you identify the right questions to be asking. Including, yes, your friendly industry analysts. This problem of what question to ask is one of the reasons I do not personally subscribe to the idea that our profession is made up of surplus middlemen, though you should obviously consider the source. Even if you have perfect data, you almost certainly do not have perfect questions.

The fact is that even when the boundaries of a dataset are narrowly defined – as with, say, the Netflix data visualized by the New York Times – it’s easy to get lost in it. The trick is no longer merely being able to aggregate and operate on data; it’s knowing what to do with it.

Find the people that can do that, whether they’re FTE’s or consultants, and you’ll have your competitive advantage. To answer the right questions, you need the right people.

by-nc-sa

Categories: Analytics, Big Data, Data.

Tags: , , , , , , , , , , ,