Do you collect metrics on how people use your applications and platforms based on measuring actual usage behaviours? If so then chances are high General Data Protection Data Regulation (GDPR), a new regulation for handling user data with frankly terrifying fines for breaches, is going to be a big problem for you. One reason I wanted to raise the issue of GDPR again is that it concerns advice RedMonk has been giving folks over the last 10 years – namely, that telemetry is a great source of value as the price of software falls.
“Network services, for the most part, do not suffer from these failings (though they may, of course, suffer from others). If customers share some data and telemetry back with providers, both parties may benefit. And that service will prove to be more compelling, I believe, for customers skeptical of the value to traditional support and service.”
Taking advantage of user telemetry was undoubtedly good and straightforward advice at the time, but when the potential fine for not handling customer data effectively is 4% of global turnover it’s very important – existential even – to be sure your security controls are extremely effective. Any gains since we started giving the advice could be lost. Your business might rely on collecting data – not in the Facebook sense, but just in terms of offering better experiences by collecting information.
There will be “black swan” events under GDPR.
GDPR will be an issue for telemetry data for startups and the industry’s biggest companies alike. Suppose, for example, you think that you’re in the clear because all of the data you collect is scrubbed of personally identifiable information (PII). Yeah that’s right, data in the aggregate is OK of course. But what happens when you hit an exception, and ask the user to send a crash report? Suddenly the scrubbed data potentially becomes identifiable, and you’re likely storing it without knowing it. Triangulation of data, and the almost comical availability of third party datasets, makes it increasingly hard to assume any data is not personally identifiable. A famous example was the Netflix challenge, which was quickly solved used imdb.com data.
The buzz phrase that’s currently bubbling up is “differential privacy”. Apple is very confident of the approach:
“Starting with iOS 10, Apple is using Differential Privacy technology to help discover the usage patterns of a large number of users without compromising individual privacy. To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience.”
But very few organisations have the kind of resources Apple does to work on things like differential privacy. My point in all of this is not to create a scare story, but to point out that the regulatory environment in terms of privacy is about to undergo a profound shift. I would certainly caution against relying on the recently agreed “privacy shield” with the EU, replacing the previous Safe Harbor agreement, especially considering that, of course, Trump has not appointed an ombudsman yet. If you touch data on EU citizens you need to dramatically improve your data governance game, and become a lot more familiar with data sovereignty issues globally.
I will be reaching out to companies over the next couple of months to hear what they’re doing about this issue. If you’d like to be proactive about it, I’d certainly love to hear what you’re doing about telemetry, privacy and PII.