James Governor's Monkchips

On Hortonworks, data management and mind blown by GDPR. First notes from Dataworks 2018

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

At its DataWorks Summit 2018 in Berlin this week Hortonworks set out its stall for the next stage in its evolution.

A year ago it was looking like the Cloud would simply eat the Big Data market, but geopolitical concerns, and enterprise inertia, have combined to create a somewhat more nuanced picture. Hortonworks argues the changes will work in its favour.

While hardly scientifically rigorous HortonWorks’ spot poll of customers during the first day’s keynote was telling – 34% said they didn’t plan to move any data into the cloud. Not using the cloud for data seems faintly ludicrous in 2018, but never bet against enterprise conservatism. It seems clear there will be a solid market for software that allows for hybrid and multi-cloud deployment and management.

Hortonworks has sometimes bridled at comparison with its competitor Cloudera, which launched first, IPOd later, but has traded at a higher multiple. After recent market guidance which disappointed financial analysts though Cloudera’s share price took a significant hit, lessening the P/E gap somewhat. Stephen here asks 5 questions for Cloudera after it’s recent conference. We’ll have a far clearer picture of comparative financials when Hortonworks reports in early May.

Product differentiation between Hortonworks and Cloudera is becoming clearer as we move beyond the product being “Hadoop distribution”. Hortonworks has doubled down on data governance and management, while Cloudera is aiming to be about the AI and machine learning side of Big Data. Hortonworks is selling the sausage, while Cloudera is selling the sizzle.

The sausage factory of enterprise data management may not be sexy, but then neither is going to court, paying massive fines, or being hauled before politicians to answer questions about slack controls, process and methods.

Hortonworks’ focus on governance is likely to pay dividends in Europe, with its stronger focus than the US on privacy, stronger regulatory frameworks and the arrival of the General Data Protection Requirement.

Hortonworks messaging around GDPR is crisp and on point, identifying specific issues and business controls that need to be managed, for example, Right to be Forgotten. It is mapping these issues to its relevant platform components. Rather than simply saying GDPR is a problem, it’s providing guidance on what enterprises need to do to comply. Given most enterprises are delinquent in their preparation for GDPR they will be looking for how to guides over the next 18 months.

GDPR is a big deal. While we shouldn’t expect the EU to start levying massive fines in the near term – up to the maximum 4% of global turnover – companies that do business in Europe will be expected to be making an defensible effort to comply.

Compliance is hard. Some of the core principles of GDPR, while not new for privacy advocates, will feel new to businesses. The rules may not be new but the context has changed with the massively increased severity of the potential penalties. Take right to be forgotten – given data’s tendency to replicate and be replicated, it can be very difficult to be sure that you’ve deleted all personally identifiable information (PII).

Or how about the right to portability – the idea you can take PII data with you when you leave to a new service. That kind of thing is hard enough with social network data – how exactly can I move my Facebook timeline to Twitter, for example?

But Abhas Ricky, Hortonworks director of strategy and innovation really blew my mind when he described a retail scenario.

“Look at retail. Take an organisation like Tesco. They’ve done a lot of customer loyalty analysis, using different models and data sets. Now the customer can say to Tesco, I want all my data back, and I want to port it to Marks & Spencer.”

Mind blown. How is that going to work in practice? The answer is of course it might not, but companies are going to need to show they’re making good faith efforts. According to Ricky 90% of organisations believe they still won’t be ready a year after the regulation comes in on May 25th.

If the pendulum is swinging towards better information governance Hortonworks focus on metadata management and automation make so much sense. The company is collaborating with IBM and ING Bank on ODPi, a Linux Foundation-hosted project, to standardise metadata management across multiple domains.

Hortonworks DataPlane Services allows management of on prem and cloud data in the Hadoop and Spark stacks for security, ops and governance. What happens if data is copied into the cloud, what are the right controls on prem and off? A customer with 53 Hadoop clusters needs to be able to track data as it moves between them.

The first “app” based on first Dataplane Services was Data Lifecycle Management, for moving data between clusters, or to an AWS S3 bucket for backup, released late last year. Yesterday Hortonworks announced its second app- Data Steward Studio, to help users discover, catalog and manage sensitive data including PII in their data lakes.

One interesting question about Data Steward Studio is who will use it. Roles and responsibilities are changing fast in data management, with the rise of Chief Data Officers, data engineers, and data scientists and the changing status of DBAs. We’ll be looking at that landscape in our next post.

2 comments

  1. Why do vendors continue these scare tactics regarding GDPR?

    The right to be forgotten goes to apply if the business collected the data appropriately and requires its retention for any number of reasons.

    The right to transportability means that a citizen can have their data transmitted to them and/or transferred to another controller where technically feasible,

  2. Colin – GDPR is going to take a lot of work given how laissez faire business have been.

    “technical feasibility” requires a strategy.

Leave a Reply

Your email address will not be published. Required fields are marked *