James Governor's Monkchips

Breaking the glass – a way forward for Google Cloud

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

googlers

Google had no shortage of significant feature function announcements at Cloud Next 2017 in San Francisco last week but is still looking for a clean narrative to sum up why it should be a preferred enterprise cloud provider.

Cloud is far from “done” in terms of enterprise spend and adoption. The great majority of enterprise workloads are still running on prem, and there are hundreds of billions of dollars of spend that are still there for the taking. Digital transformation efforts at Fortune 500 companies make for good magazine copy, but it is, as they say in English football, “early doors“.

So Google needs to position itself as a platform for what comes next, rather what just happened. With a focus on platform level services, maching learning and AI, and an increasingly broad and deep commitment to open source, it’s getting there. Most of all though Google needs to be seen as open for business, open to collaboration, more humble, more ready to teach to learn. Enterprises place great stock on having partners that have an opinion, but also want partners that don’t seem too clever by half. Google needs to smash the glass separating it from the outside world to be more engaging.

A good example of the kind of leapfrog win Google needs is HSBC, a recent customer win. The massive “systemically important” bank, having unsuccessfully spent years and tens of millions of dollars working on Hadoop on prem for anti money laundering (they didn’t say this in the keynote), finally decided to try a new approach, and it choose Google to load and analyse financial data as part of its antifraud efforts. After its well publicised problems with a Mexican subsidiary the firm needs to be more careful than ever. Also note HSBC is using AWS for dev/test, a standard pattern for new platform adoption, but chose Google for a data intensive compliance workload that is literally existential to the company – it is still under the terms of a deferred prosecution from the DoJ.

A spanner in the works

Google’s data chops are undeniably solid. The company announced Cloud Spanner, a globally distributed, fully consistent SQL-compliant database a couple of weeks before the conference. Two Phase Commit is back! Spanner is particularly noteworthy in that it goes against the grain of CAP Theorem (in distributed systems design you need to make tradeoffs between consistency, availability and partitionability), a seminal piece of work by Eric Brewer, who is now VP of Infrastructure at Google. Scaling databases horizontally is really hard, scaling them geographically even more so. At NEXT the company talked up running atomic clocks at every data center location to enable data consistency, but according to Brewer’s paper on Spanner and CAP, consistency achieved through locking, supported by the availability and performance of Google’s wide area network. Google’s TrueTime does however play a crucial role in data snapshotting and synchronisation which obviously has implications for consistency.

It will be be interesting to track Spanner’s progress. While NoSQL has been essentially nibbling at the edges of the Oracle and IBM mainframe data franchises, Google is going directly after the high scale transactional database. Spanner won’t be cheap, but then neither is Oracle. From a timing perspective Spanner is on point – Oracle’s dominance was assured by being the database of choice not just of the customer, but also the ISV. Today however ISVs are moving into the cloud, and Oracle acquired its own application stack to compete directly with them. I assume that Google put in a call to Salesforce to rehosting on Spanner. Enterprises are showing a solid appetite for building rather than buying their apps again after a long winter of purchasing-first IT. It may be time for a major disruption in the market for transactional databases. Spanner, by its nature, can’t or won’t be open sourced, though, and will by necessity involved lock in.

Falling in love with open source

When it comes to open source there has been a quiet revolution at Google over the past couple of years. I wrote about this last March:

“But by 2015 Google realised that open sourcing the code itself, rather than just publishing papers about its approaches, made sense. Why watch somebody else create another Hadoop or Mesos when Google could build a community around stuff it actually built – and so Kubernetes was born. Things got really interesting when Google’s engineers met engineers at Red Hat they deeply respected. When we write the history of Google this will be seen as a seminal moment, when the appliance of science became properly a community-based activity. The decision to open source some of Google’s core machine learning technology – TensorFlow – followed naturally on the obvious and growing success of a better, more collaborative model for applied science.”

TensorFlow is a runaway success – effectively the defacto standard open source library toolkit for machine learning – with more than 50k stars on Github and more than 23.5k forks (when GitHub published its State of the Octoverse in September 2016 TensorFlow was at about 14k forks).

Kubernetes meanwhile is trailing along with a mere 25k stars. It’s a project on a roll, adopted by everyone that matters in cloud.

Other Google open source infrastructure projects include Apache Beam – for defining batch and streaming data-parallel processing pipelines, based on its work with Cloud Data Flow. Google’s Go language has quickly become a favourite of systems programmers. One of the company’s most recent open source contribution is gRPC, a high performance RPC framework for HTTP/2 has been well received by the microservices crowd.   

Google has found that not only does it enjoy making open source contributions, but these contributions are generally well received, and open up new potential market dynamics – see for example Kubernetes and the Anyone but Amazon Club

Talking of Amazon – there is some clear differentiation here. Amazon Web Services has definitely not taken the open source pill. It is not a major contributor to open source projects, nor has it taken a lead in open sourcing its own code. In many respects the company’s culture feels more akin to Microsoft in an earlier era. Unlike almost any other modern tech company AWS holds its IP very close to its chest. And while AWS made it incredible easy for commercial open source vendors to go to market, it now increasingly competes with them – by offering for example managed hosting for open source databases.

As Stephen explains here to commercial open source the biggest competitor is Amazon. Open Source remains something of a chink in Amazon’s amour, although the company is doing such a great job of packaging it for now that is is a runaway economic winner.

Being different

So how does Google need to be different? Its commitment needs to go beyond consuming open source software. It needs to contribute code, support outside projects, and find a business model than balances its own offerings based on open source code and platforms but also a level playing field for third parties. Sam Ramji,VP Product Management for Google Cloud Platform  has the job of managing that transition. He has been at Google just over 100 days. He spent years at Microsoft working on open source and standards, before working at Apigee, and latterly running the Cloud Foundry Foundation. He is thoughtful, smart, ambitious, ethical, and perhaps I should add for full disclosure reasons a personal friend of mine.

Current Status

Amazon Web Services dominated, and dominates the Infrastructure as a Service cloud build out, but all tech markets have natural inflection points, and the trick is to catch the next wave as it comes in.

Microsoft has done a great job of getting back in the game with Azure. The platform is solid, and Microsoft’s go to market partnering is strong, particularly in open source. In conversations over the last 18 months Microsoft has regularly come up as the easiest company to do business with for commercial open source vendors.  The Microsoft enterprise sales machine is heavily skewed to Azure in terms of compensation, and is selling capacity accordingly. Now it’s a question of driving workloads to the platform.

IBM acquired SoftLayer, but IaaS was always going to be an awkward fit for a firm that was on a path to divest low margin businesses (Thinkpad, x86 servers, etc). So IBM has been focusing its attention on platform level services through Bluemix, and Watson for machine learning and artificial intelligence.

Rackspace was unable to invest enough capital to keep up as an infrastructure player and is now essentially a third party services company, supporting other company’s clouds. One of the announcements at Next is that Rackspace will become a Google cloud partner.

Engineer to Engineer

One of Google’s greatest strengths is engineering. But that strength has also been a weakness because of perceived arrogance in dealing with outsiders – those outsiders often being customers and developers. Google can certainly be high-handed in dealing with customers – “oh, you’re doing it wrong! RTFM!” – and that’s one of the sharp edges the company certainly needs to smooth off.  Google’s founding culture is to focus on the platform, rather than the customer. This is the stuff that folks like Diane Greene, Sam Ramji and Brian Stevens have been brought in to fix.

One of the interesting moves for Google is turning the engineering chops into the asset it will be is a program it calls Customer Reliability Engineering (CRE), a spin on the company’s Site Reliability Engineering (SRE) culture. Google is crazy good at ops. I had an interesting chat with Alexis Richardson of Weaveworks when I got home from NEXT and he made a really interesting point about two of his engineers – Tom Wilkie and Jonathan Lange – both ex-Googlers. When the company’s services go down, Tom and Jonno fall naturally into a fix it rhythm, where they don’t say much, but get their heads down and work together seamlessly at their own command lines to solve the problem. 

When Pokemon Go launched the traffic spike was literally crazy. While there were some complaints about downtime, they were actually pretty minor in the scheme of the services that were being delivered to tens, then hundreds, of millions of customers. Within weeks it was 500m. And Niantic had 16 people working on the team at that point. 16! Google provided SRE services, working alongside the Niantic folks to scale the service, which is where the CRE program comes from. 

Enterprises today are hungry to learn from Web companies and startups. In the application development space Pivotal has basically nailed the business of retraining enterprises to work more like startups. The Pivotal Way, based on pair programming and test-driven development, is now a religion for companies like AllState, Comcast, Ford and Home Depot. However, nobody has yet captured the tribal knowledge around SRE and devops and made it consumable. It’s no accident that Pivotal is Google’s first outside partner at it looks to build an ecosystem of companies offering CRE services and education.

As part of its CRE offerings Google now offers engineer to engineer support. The customer will have Google engineers they can call on directly when they hit reliability issues. The team is going to be stellar. Some of Google’s best internal SREs are being seconded to the team.

Breaking the glass in PaaS

Google was very early to Paas with App Engine. Arguably too early, and missteps with the platform around pricing strategy – don’t surprise the engineers! – and product management – but do pay attention to their requirements! – cost Google dearly in customer trust. App Engine is a flexible PaaS environment, with native container support, supports java 8, servlet 3.1. Should be under consideration as a platform, but trust is holding adoption back.

The new industry take on PaaS is Function as a Service, otherwise known as serverless. Amazon’s Lambda is now exploding, with enterprises and startups both making extensive use of Lambdas for new application development, but also as general event-driven glue code to augment application development and script operations between all of the different services in AWS Cloud. You should read Fintan on serverless and devops here.

With Google Cloud Functions, the issue is still live because the product is still labeled as a beta. Google may have invented the notion of a perpetual beta, but it’s not something enterprises are ever going to love. Note to Google – production mode soon please. Cloud Functions aside, Google has one back end as a service, or perhaps that should be backend as a serverless – that developers have bought into – Firebase. The platform is a really good beachhead for Google, especially now Parse is gone. If Google acquired Auth0 it would be in a position to start building a suite of best of breed serverless platforms, that even Joe Emison might approve of.

On day one at NEXT there was a throwaway line that crystallized the narrative of NEXT for me. With App Engine and Cloud Functions you have the ability, said Brian Stevens, to “break the glass” – with monitoring and tracing tools available to drill into the service. I felt there was a clear parallel here with the evolution of the firm itself, between Google classic, where if you had a problem you were kind of on your own, and the new Google, where if you have a problem, it would be escalated, people would get back to you, you could maybe get access to the engineers building a service. During the community event Sam Ramji made a public commitment to deal with the backlog of issues on the Google Cloud Insider list. Kin Lane, the API Evangelist, talks to the supprt issue in this thoughtful post.

“I saw an interesting chasm emerge while at a Google Community Summit this last week, while I heard their support team talk, as well as their developer relations team discuss what they were up to. During the discussion, one of the companies presents discussed how their overall experience with the developer relations team has been amazing, their experience with support has widely been a pretty bad experience–revealing a potential gap between the two teams.”

One key part of breaking the glass is communicating with outside people. On that score Google is now engaging to a fault. It is *all* over Hacker News and Stack Overflow

https://twitter.com/johnsheehan/status/842195783181062145

The new pragmatism, a little birdie told me

One of Google’s most enteprisey announcements at NEXT was support for SAP HANA. So much so good. There is no louder enterprise dog whistle than being certified to run SAP. But how is Google doing this exactly, given HANA doesn’t currently run on Kubernetes, and requires a significant amount of local RAM to run. Thus AWS created a special instance, the EC2 X1 for HANA, for the purpose. So about the rumour – what I heard is that Google had acquired of Cisco UCS boxes to run HANA, and had plugged them into its network as non standard infrastructure. It could be I got the wrong end of the stick, but it’s still not clear to me exactly how Google Cloud Platform is supporting HANA, so I will give the idea credence on that basis.  A couple of related announcements were that Google was going to HANA Express on Google Cloud Launchers as a developer onramp, and HANA Cloud Identity was being integrated with Google Cloud IAM.

More pragmatism, and commitment to being an IaaS player? Google’s stated intention to be the “best platform for Windows”. It’s hard to see that happening, certainly in terms of financial commitment in comparison with AWS, which has made huge investments in Windows support. But even the claim is a good marker in terms of pragmatism and commitment to the enterprise market.

In sum

I could go on, and there is certainly a great deal to see in terms of machine learning and AI, but in the spirit of narrative this seems a reasonable place to sign off.  For Google to achieve number two status in the market it needs to be the best partner in the space for developers and enterprises, which would be a huge revolution. Microsoft is very good at both customer focus and ecosystem development. But there were many signs at NEXT that Google understands the challenge ahead, and wants to become a more human, engaging, pragmatic presence in the market. The best platform doesn’t win – the most accessible one does.

You should also check out the RedMonk team’s Slack Chat recap of the event.

 

disclosure: Google is a client and paid T&E for the event. AWS, IBM, Microsoft, and Pivotal are all clients.

Thanks to Francesc and Mark from the Google Cloud podcast team. The image seemed appropriate.

4 comments

  1. […] Google Cloud Platform’s recent Spanner announcement, which I covered here, it takes Brewer’s CAP Theorem – that a distributed data store can only offer guarantees in two […]

  2. […] “effectively the de facto standard open source library toolkit for machine learning,” Governor wrote. Kubernetes is also a community darling, with open source heavyweights like Red Hat lining up to […]

  3. […] The SRE program, and it’s sibling Customer Reliability Engineering, are very important to Google as it seeks to encourage customers onto Google Cloud Platform, turning it’s engineering chops into a competitive advantage by being more approachable and collaborative. […]

  4. […] out Google Cloud Platform revenues, but the company has seen some notable customer wins, as its Engineer to Engineer program helps it engage more successfully with lighthouse […]

Leave a Reply

Your email address will not be published. Required fields are marked *