Data transformation is the new digital transformation. It affects everyone.
Ocado is an interesting company to learn from because it was a “Full Stack Startup” before they were a thing, a vertically integrated business that builds all its own technology and take a new approach to a market. It runs everything from logistics to web site through warehouse routing and optimisation, with a team more than 1000 software engineers.
Ocado Technology also provides fulfilment services to third parties, which makes the ability to scale critical. It was an early Google Cloud Platform (GCP) reference, and adopts Google technology as aggressively as can, given the limitation that it has a corporate rule against using beta technology in production. The company now has ½ a Petabyte of data in its Cloud Dataproc managed Spark cluster, 20x what it previously had its Oracle-based datawarehouse, growing by 250M events per day.
So that’s the infrastructure – but what about the people, and the transformation that led the company to start hiring data scientists for every development team, rather than keeping them as a separate group.
Daniel Nelson, Ocado head of data, said:
“Traditionally we got data, gave it to a data scientist, who put it on their laptop, they went off for a few months, wrote some code in Scala or Erland or whatever and then gave it to a dev team and said could you optimise it, any problems and it had to go back in backlog.”
Not exactly agile then.
Ocado tried a different approach when it started a project to try and identify ways to improve customer contact response times. Originally agents responded to emails in first in first out fashion – about 2000 per day – which was fine on an ordinary day but not if volume was much higher, for example because of bad weather.
What I like about the story is that the new approach was driven by engineers. Some developers from the call center team while they were “at the water cooler” (probably the pub!) asked their peers from the then separate data science team – can you think of a way to solve this?
Ocado ran with it, and put some of their data data engineers into the contact center to learn the job, colocating them with its software engineering team. The solution was automatic tagging of incoming emails for sentiment, pushing complaints to the top of the work list. Ocado got early access to the Google natural language API but at the time it wasn’t sophisticated enough to do all the tagging.
Ocado therefore took around 1M emails, 3 years worth, and pushed it into the cloud. The six month project was in 3 phases
1 month for domain knowledge
3 months for prep
2 months to build and industrialise
Ocado took a software engineering approach, making sure everything was reusable, with a full test suite. The teams used Git, and used Ocado’s normal delivery pipelines.
“Data Science is Software – it’s reproducible and maintainable”
Ocado spent a month training the model, took a pragmatic approach. Get it working, deploy and test. It therefore built is own solution using Tensorflow running on Google Compute Engine. As stated above (as above Ocado couldn’t use Google’s managed data services because they were still in alpha at the time). Today’s alternative would be Google Dataflow and ML APIs.
The results- now if Ocaco is dealing with with urgent negative sentiment emails it does so four times more quickly. It was also able to move 4 people to more creative customer support roles. It is also seeing fewer refunds.
The key insight culturally for me is that Ocado now hires data scientists directly into its development teams in different parts of the business, rather than keeping them as a separate group. Well that is, when it can onboard. It is of course hiring…
It’s also worth mentioning that Ocado Technology isn’t just hiring. It’s working on the future pipeline with it’s Code for Life kids coding initiative.
Google is a client.