I’m sitting in the San Francisco airport trying to get home from a week of client visits. It’s been an inauspicious end to the trip, with multi-hour delays, lackluster amenities, and that moment when my phone’s power dipped below my “PANIC!” level before I could find a seat by an outlet. So naturally now is a perfect time to reflect on gratitude and machine learning model training.
I originally read this anecdote on Twitter and was delighted enough that I went to find the source:
“For example, for Jetpac we wanted to find good photos to show in automated travel guides for cities. We started off asking raters to label a photo if they considered it “Good”, but we ended up with lots of pictures of smiling people, since that’s how they interpreted the question. We put these into a mockup of the product to see how test users reacted, and they weren’t impressed, they weren’t inspirational. To tackle that, we refined the question to “Would this photo make you want to travel to the place it shows?”. This got us content that was a lot better, but it turned out that we were using workers in south-east asia who thought that conference photos looked amazing, full of people with suits and glasses of wine in large hotels. This mismatch was a sobering reminder of the bubble we live in, but it was also a practical problem because our target audience in the US saw conference photos as depressing and non-aspirational. In the end, the six of us on the Jetpac team manually rated over two million photos ourselves, since we knew the criteria better than anyone we could train.”
– Pete Warden, Why you need to improve your training data, and how to do it
I love this story because it captures so many challenges to successful data modeling. As Warden notes, training a model involves the iterative process of learning to ask the right question and refining your approach to the problem. In particular what makes this such a fantastic anecdote is that is also touches on the need to explicitly communicate and codify value judgments into a classification system.
I originally interpreted the ‘bubble’ he referenced to have a geographic connotation, but I realized that it’s much more than that. Ostensibly I am part of the Jetpac travel guide target market, but until very recently in my career, I would have seen business travel in the same light as the data labelers enamored with conferences.
It can be hard to remember that now. It doesn’t take many trips to become worn out by (or jaded about) work travel; it’s hard. Missing your family, long days, long nights waking up at random hours when your body fails to jump time zones, long flights with ever vanishing legroom: it all takes a toll. But it’s helpful to remember that paying this toll is such a privilege.
I spent much of my early career in a position where it was a (seemingly unattainable) dream to check into a flight instead of clock into an office. It was a huge effort to collaborate with other departments, let alone share best practices across an industry. The idea of an expensed meal was a pipe dream, forget my company picking up a night in a hotel.
In those years, if you had asked me to help train a data set on desirable places to go, I probably would have classified images of conferences as amazing places I’d like to visit someday.
I’ve had an incredibly fortunate career path as a Fortune 500 knowledge worker in a mid-sized U.S. city. I won’t presume that business travel is perceived to be glamorous across the board, but imagine how much wider this perception extends when you expand this view to a variety of careers and industries, or more rural geographies.
The context required to train a model is not universally shared or easily communicated. By most measures, my labels should have easily aligned with Jetpac’s image classification requirements (and if I were to participate today they probably would), but it’s distinctly possible that earlier in my career I would have misclassified images based on my disparate life experiences and accompanying value judgments. This isn’t to say that my judgment would have been wrong in a moral sense, but rather that it would have given suboptimal results for the model.
If you’re training a model, and particularly if you’re training a model using human inputs, it’s worth remembering that the scope of possible misalignment is vast. You probably won’t predict every way that people bring their own context and understanding to the situation at hand, but you should be prepared for the value communication process to be iterative. And we should expect the need to communicate these values to become increasingly important.
Our industry faces forthcoming challenges in aligning our data models with shared societal values. As the role of our software grows, the method in which we train these models becomes increasingly relevant. Jetpac’s story is a lighthearted example; there is no harm in them opting to label the data themselves rather than attempt to build a collectively understood value structure around the desirability of work trips. But this will not always be the case; there will be times when we shouldn’t iterate until the model matches our bubble, because we need to take the context that others bring into account. Metaphorically speaking, there will be times when have to let the model reflect the fact that some people are excited about attending conferences.
And since I promised this was about data modeling and gratitude, I’ll close with some thoughts that are exclusively directed to myself: you are indescribably lucky to be stuck in this airport. Don’t forget it.