Classification and Its Discontents | Lisa Kamm | Monktoberfest 2023

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Get more video from Redmonk, Subscribe!

This discussion brings to light the inherent biases in widely used classification systems like the Dewey Decimal system, revealing its racism, sexism, homophobia, and Western-centric nature. This revelation, stemming from the field of library science, underscores the profound impact of categorization on societal perceptions, particularly in areas like gender identity and medical treatment. The importance of understanding and critiquing these embedded systems, especially in the age of machine learning where classification becomes more opaque, is paramount. This talk – born out of a dinner conversation at a prior Monktoberfest – aims to dissect the specific ways the Dewey Decimal system contributes to racial biases and its consequences in information discovery and research, highlighting the critical need to reevaluate and reform the frameworks that shape our world understanding.

Transcript

OK, so as Steve said last year at dinner, I don’t even remember how we started talking about the Dewey Decimal System. And I made a very snarky comment about how horribly racist it was, and why are we talking about it anyway, and then he quickly dismissed topic and moved on and everybody at the table said, what? And that led me to scramble, to remember over 20 years ago that I did a library science degree. Somebody at the table recommended that I it as a talk this year, so here we are. Classification and its discontents: Or yes, the Dewey Decimal System is very, very biased. I’m going to talk a by talking a little bit about what a classification system. These are grouping and organizing data or knowledge so they can be hierarchical, they can be flat, they often can be more documented and defined. We deal with them every single day. Anyone who’s ever tried to go over an insurance claim and had it rejected because it’s been coded wrong has had to deal with the implications of classification systems that gauge our entire world.

    By definition these systems focus on the commonalities. So they flatten out any differences, any nuances, any things around the edges, because literally they’re meant to break things down into the elements of their commonality. Which undermines all differences.

    Classification systems are also very much tied to the time and place they were created. It’s hard for us to understand present day how much the bias is built in, so here as an example which is bills of mortality from London in 1720 1721. These are cause top causes of death. This really baffles me. These things to our modern mind don’t make sense. So now let’s compare to US deaths in 2021. This list is much more comprehensible to us. We understand the groupings, we understand what’s being shown, what’s seen. It — we get it. But you know, no matter how many Covid deniers out there, which immediately shows something that seems like a really straightforward and understandable modern system, will be viewed through people’s own political and personal lenses as bias or distort.

    So with that quick introduction into classification systems and some of the inherent bias that just comes from it, let’s talk a little bit about the problem with Dewey.

    The Dewey Decimal System, it’s important to understand the role of libraries in society to understand why this would even matter. The ALA definition talks about access, it talks about organized by information. And it talks about individual learning in advancing society as a whole. Libraries are one of the few places where you can really go and get information and get it without being tracked and it really provides a huge value to the public. Although on the left here I have something that many libraries will provide, just tough topics by teens, which go where are the places to go and look for information if you’re a teenager if you need information and you’re not comfortable about talking to adults about this top I can why. This is hugely valuable. And also point out if this weren’t hugely valuable to an access to information, we wouldn’t be facing so many issues around people trying to get books banned these days, because this is where can get information that’s otherwise tracked or locked down.

    So with that quick pitch of why why libraries are amazing, let’s move on to Melvil Dewey. So he was born in 1841 and created his library classification system in 1973. He founded the ALA. He created Columbia’s library school, he served as New York State librarian, he helped organize the first Olympics, OCRC is the lobe consortium that owns the copyright on the Dewey decimal classification system and sells software to implement it. On their website, they describe his legacy as complex.

    That’s quite the euphemism. Even through his own time period he was problematic. He was obsessed with the number 10. Whatever.

    His social club only admitted white Christians, the one he founded. People of color and Jews were absolutely not allowed. He was the most notorious sexual harasser who managed to get him fired from the organization that he founded in 1905. So we’re talking as we go through this, that you always hear people talking about things maybe you need to understand things and with respect to their time and place, in fact I’ve talked about it already.

    But even with respect to his time and place, he lost jobs, he was censured, he was pretty awful.

    That said, his system is the single largest library categorization system in the world. It’s used in 95% of the Public Libraries in the US. It’s used in 135 different countries and we’ll get back to that one later, and in more than 200,000 libraries.

    One of the reasons it’s so preeminent is that the Library of Congress issues Dewey numbers to book’s preclassification so if you’re a library and you’re a small library and you don’t have enough resources and you don’t want to recatalog every book you get, this is easy, this is the default.

    And back in the days of actual physical card catalogs, the Library of Congress would sell the physical cards with the Dewey numbers on it for classification already, so it was just the path of least resistance.

    Even looking at — even ignoring all of the bias issues, which we’ll go into great detail next, it’s also a complicated and hard to learn system. So it’s not even necessarily all that effective in the environment it’s meant to be effective in.

    So let’s quickly look through how the Dewey Decimal System works. it breaks all of the world’s knowledge into segments of 10.

    So let’s talk about literature. Lit is the 800s, 800 to 899, so all of literature is divided into that. Not surprisingly, being Dewey, it’s then divided into groupings of 10 underneath that. So you can see here what his breakdown of the world’s literature is.

    And then 860, which is Spanish, Portuguese and Gallician literature breaks and this keeps going. It could theoretically go to an infinite number of decimal points.

    So what are some of the problems we see here? We saw in literature that everything from 800 to 899 is European literature. That’s it. The rest of the world is literally called other literature. It’s hard to think of something more sort of specifically othering, than saying oh, your whole area is “other”

    And you know, it’s — religion, Christianity is 31% of the world’s population, and 90% of the Dewey Decimal System.

    LGBTQ has been moved all over the map. and as a group of people it is now under sexual orientation, transgenderrism and intersectuality, where you’ll find it grouped with sex work, child trafficking and kink.

    So this is

    Folks in Barack Obama’s life are in the 300s, because he’s Black. All of the other presidents are in the 900s under history. And the last example I’ll give here, and I’ve got so many more of these

    So everything about Native Americans is treated as history. Nothing is treated as modern content or relevant today.

    So despite these flaws, it’s used worldwide. In Thailand, a country that was not colonized and is 95% Buddhist still relies on the Dewey Decimal System categorize their books. There are ways to customize ways Dewey for local relevance, that doesn’t happen a lot. And there are a lot of studies that demonstrate that it’s a theoretical possibility, but it’s so rarely used that it actually matters.

    It amazes me that the structure of Dewey has managed to successfully serve as a colonizing force of what knowledge should be in countries that were never colonized. It’s had its own ability to adapt how people think about knowledge on a much broader scale.

    The library — it can be changed. The system can be changed. There are processes in change to change the Dewey Decimal System. It seems obvious, go in and change it so it’s not so horrible. A couple of things there. One, you’re literally changing where physical books sit on physical shelves. So it’s actually a bigger project than, say, rearranging files on a disc. But that’s the least of the problems. It’s slow. I found myself researching through this reading through all of the recent changes to the Dewey Decimal System and discovered that it was only this past July that climate change was added as a topic in the Dewey Decimal System. This is obviously one of the biggest things impacting us in our world going forward and we’re still using structures created in the 1800s to categorize it. In a particularly fast-moving science, it should not have taken ten years of effort to get this added as a category so people can use this in their research.

    The political one you will mention, because it does get very political is there was a desire to move away from the term illegal alien, so the Library of Congress and it trickled down to Dewey, upDaded to noncitizens, Ted Cruz protested this change. When do you think the last time Ted Cruz thought about a library classification system was when it wasn’t about making a political point. I don’t think he spends his days learning how to classify knowledge in the broader holistic perspective, just a guess.

    But these things happen. It’s really hard. And you know, even the folks who are making the decision on how to change it, well, yes, it is done by an international board, it’s done by an international board for almost all people from former British colonies with the exception of someone from Norway.

    So what are our options? What do we do? People have been complaining about this, raising this issue for a very long time. This is not new. Dr. Dorothy Porter Wesley is one example. She helped found the collections at Howard university and as she put it in an interview, everything that Black Americans or Africans live under 325 or 326, slavery or colonization. She was having none of that. So she went through and pulled out all of the works by Black authors and categorized them as she thought it was appropriate. She redid everything and hand-catalogued it all and had amazing, amazing impact, but she was one of the first people who had a major impact in doing these things, but she is far from the only one.

    Different cultural systems have very different needs. The XwiXwa people? British Columbia have their own library. One of the things I noticed was they start by letter but residential schools, ER is at the second level so it’s very, very high up in their hierarchy, which given the Quilliam pact, the intergenerational trauma and issues of some of those schools and that experience makes huge amounts of sense in this world, and in this — and in their culture and society. It would be on the level 7 in Dewey, interspersed with anything about education of students by ethnic and national origin, but this is an example of this works for a group of people, and doesn’t fall back on a sort of historic, this is what the world view might have been in 1870.

    Alternative systems here, they all have issues, problems, limitations. Library of Congress is supposedly used in academic libraries, it’s not as bad as Dewey, but it has similar problems. Brian Deer classification system is focused on Indigenous communities, lets the individual communities do a lot of customization, so books can be filed in different places in different libraries, because it’s not trying to fit everything into a single system. Book industry study group wants to sell books.

    Metis is aimed at children. Bliss is mostly used in the UK, which does allow books to be filed in more than one place, but they still all work under the idea that you need a single classification system and a single view to look at knowledge across the board.

    And going back to something I mentioned earlier, all of these systems have inherent time and place biases. When humans are making classification systems, when they’re doing these divisions, there’s no way of avoiding it. Everything we do today will be looked back on 10, 15 years ago, wait, what?

    So I think it’s really important that as we look at all the different ways we can classify information, think about systems, we keep in mind just how biased they are, even if it’s our own biases and even if it’s just inherent.

    So a couple of people I mentioned this talk to, said yeah, but we now just Google everything, right? How does technology matter?

    Search engines are different, but have similar problems. First of all, at least with Dewey, it’s really, really horrible, but you can look at it and see how the decision’s made and its visible how it’s horrible.

    Search engines, we don’t know. We don’t know what’s not coming up and this is not a big conspiracy theory, this is a monetary thing. Search engines are driven by making money and how things will make money and get more clicks, so things can fall out of those top search result pages and not be available without us ever realizing it. And then of course there’s the eternal spammers versus search engines, which keeps changing everything. Lastly, search engines aren’t as good at cross-referencing and showing other areas and how things come together.

    As we move into AI and ML. And I’m sure there are people in this room that know about AI and ML way more than I do, we do face a concern into how we start building these things into our base models and into our systems. When you’re looking at very large language models, generally a ton of data is to used to change a base model. And it can be tuned after that, but you’ve got this base model that’s a big thing that’s massive amounts of world to rebuild and there’s no way to back something out of it. So there’s something built into this base model and you’re like, wait a moment, we really can’t have that. That’s really problematic. You can’t just back it out. You can’t tell if it’s been used to make decisions, there isn’t even a lot of clarity in most of these systems about what it is that is feeding in to train these models. On a much, much smaller scale, you look at things like Amazon’s attempt to classify job applicants, which just proceeded to reinforce the lack of diversity in their culture by recommending people who looked like all the other people who already worked there.

    And I’m also concerned as I look forward, about some of the impacts of the fact that AI and ML are being used today to implement classification systems, as in they’ll take a huge amount of human-trained information, and then they’ll generate — they’ll have an ML model that they can train on this, and then it will get used to classify future-looking — future documents, future insights, future whatever.

    At least when humans are doing that review process and doing that categorization process, it’s a scenario where, if things start getting really out of whack, there are eyes on the system.

    There aren’t necessarily eyes on the system here. How is this happening?

    Are we going to get stuck in a point in time where we’re classifying based on material that was classified earlier without the ability to step back and ask the question, is this relevant? Does this still matter?

    So I wanted to think of a great way to wrap this up with a clear call to action, but I don’t really have that clear call to action. I think the things we can start with are understanding and recognizing the impacts these systems have throughout our work and throughout our lives, by asking questions, how are these categorized? What are the biases built in? Traceability, you can look for traceability in AI. Yes, it’s a very hard problem, but also AI is a very hard problem, so can we look at both of those things? And we just need to pay attention, because these things, as with Dewey, so many of these things fly under the radar and we don’t even really how they’re impacting what is coming at us day to day. With that, I will just put in yet another pitch on how Public Libraries are amazing, please support them.

    [applause]


More in this series

Monktoberfest 2023 (13)