As AI continues to reshape our industry, it’s important to understand how it intersects with open source. AI is dependent on and composed of open source software, for one, and it’s also triggering new questions on how to define open source in the context of AI. To explore these and other questions, Julia Ferraioli of AWS joined Stephen O’Grady of RedMonk for a lively discussion of all things AI and open source.
This was a RedMonk production, sponsored by Amazon Web Services.
Rather listen to this conversation as a podcast?
Transcript
Stephen O’Grady: Good morning, good afternoon, good evening. This is Stephen O’Grady from RedMonk. I am here today with Julia. Julia, would you care to introduce yourself?
Julia Ferraioli: Hi, everyone. My name is Julia Ferraioli, and I am an Open Source AI/ML Strategist with Amazon Web Services.
Steve: So Julia has been doing, well, Julia’s background is in open source going back a long way, but particularly in recent years, has been doing quite a bit with open source and AI. And obviously, as we’ll discuss, that’s a particularly fraught topic these days. So, Julia, we’re going to talk — AI is obviously the topic of the moment, open source and all that. But I thought it might be useful to just — what do we mean when we talk about AI? Because in other words, so many of the conversations I’m having at least are basically just LLMs. So you want to talk about that for just a minute?
Julia: Sure. I mean, AI is such a rich field. Its history spans decades, right? We first started talking about AI in like the 1950s, 1960s. So this is a field with a lot of history. And if you’re a history geek like I am, I highly recommend checking some of it out. But we have a few different camps, actually, that’s probably underselling it, a few different camps in AI. So when we’re talking about AI, we could talk about everything from trying to understand how humans think by modeling it with machines to what we tend to mean these days, which is machine learning, trying to approximate human-like capabilities with machines. Not trying to do it the same way, but getting to kind of the same end result. And we do this with math, we do this with statistics, and a lot of times it’s very good at what it does, and a lot of times it’s hilarious at what it does. But we’re getting some really creative applications of AI in industry these days, and by and large, they’re focused on, like you said, LLMs or neural network based systems. So there’s a joke in AI that as soon as something becomes mainstream, it’s no longer considered AI. I wonder if we’ll see that with LLMs. I don’t think so, but…
Steve: Yeah, we’ll see if we get there. Okay, so obviously AI is more than just LLMs, right? We’ll just take that as the baseline. Okay. So one of the things that when we talk about AI, obviously there’s tons of interest and tons of chatter around AI and open source and so on. One of the things I think that gets lost in the shuffle because there’s so much focus on just a few components here, what is a license for a model and so on. But I think really what we need to remember is that, and you have talked to me about this before, which is the reality. If you take apart any major AI project, there’s a ton of open source under the hood, as it were. What are your thoughts? What can you tell us about that?
Julia: Well, open source has really always been a key element in AI. We have benefited tremendously from the open collaboration model of researchers from the past… I don’t want to name a number, but many decades. And because so much of AI is heavily math dependent, we have so many open source libraries that help us implement workloads that need to be incredibly performant. We wouldn’t be here today talking about AI if it weren’t for the decades of effort that had been poured into open source libraries like LAPack, Scikit-learn, NumPy, OpenCV, PyTorch and so many more. If you peel back any machine learning workload, you’re going to find a plethora of open source for a very good reason. It’s been the enabler of modern AI.
Steve: When we talk about open source and AI, we have to talk about open source and definitions. And what does that mean? Obviously, there’s a big debate within the OSI. You’re playing a critical role in these discussions. So for those who aren’t familiar with the background here, basically we’re trying to, we have an accepted definition of open source as it applies to software. We’re trying to, as an industry and the OSI is leading this effort, come to a definition that applies to AI. So what’s your take? So where are we at and what are your thoughts?
Julia: Yeah, I mean, so when we look at what makes anything open source, we look at the four freedoms to use, inspect, modify and distribute. There are additional layers on top of that, and the OSI stewards this definition for software, as you mentioned. But AI looks different, it’s more complex to build, it involves heterogeneous dependencies. It’s harder to reason about than most, not all software, but most software. So how do we get the same benefits out of open source AI as we do from open source software? Can I inspect the data upon which the AI system was trained? Can I make different choices about what dimensions of that data to prioritize or deprioritize? Are there restrictions on where and how I can use the system or its output? These are all open questions that need to be refined, carefully considered and evaluated. As we’ve talked about already there’s plenty of AI and ML that’s unquestionably open source already because we’ve been using these software libraries for decades. The difficulty comes into play when we’re talking about the systems, and because they involve so many different components, the composition of them introduces complexity and a ton of possible combinations and states.
So if we try to reduce the dimensionality, we can use the pretty simple litmus test. If a part of the system was removed and you can’t recreate or reproduce the system, then it’s probably not going to pass the four freedoms test. If I can’t retrain, not fine tune, but actually go ahead and retrain from scratch the model. Since the dialog is about LLMs these days, then I don’t necessarily see how it can be considered open source.
Steve: Yeah, yeah. And it’s obviously this is a debate that’s raging, right? And I’m not going to put you on the hot seat for this, but just for me, one of the things that’s been obvious for a long time is that while we’re still arguing about what open source is, it’s easy to tell what is not, right? And we have many, many projects out there. I won’t name names here, but there are many, many projects out there that are masquerading as open source, that are, objectively speaking, not because they are essentially imposing artificial use restrictions on what can and can’t be done and who can and can’t use it. And again, while we can’t tell what open source is, we definitely know what it’s not. And that is very much a not. Which brings us to the closing question here, which is I have my own hopes and dreams and so on, as well as my own concerns as I just outlined. But as you sort of think about, we have open source, and open source can be this accelerant and this dramatic enabler of progress and has all these wonderful benefits. What’s your hope as it applies to AI in any dimension that you want to take that?
Julia: I love all the puns. I’m really hopeful that open source brings more people into AI as well as the other way around. The fresh ideas, the enthusiasm and different perspectives that people have brought to open source software are a fantastic way to break out of any sort of local maxima that we see in the field. And when we think about how open source has accelerated development, empowered people around the world, I’m really hopeful that when we think about the intersection of open source and AI, we see that same effect. We see better transparency better systems and really fun and creative uses of the technology. I’m really excited for how it can work to make our lives easier and to give people the empowerment to create, to innovate, to improve that open source software did.
Steve: Yeah. Yeah. I always come back to a quote. I won’t get the exact quote right, but Matt Mullenweg, who’s the creator of the WordPress project, said years ago that open source is a hack that allows competitors to work together in an open fashion. I think that’s one of the things that certainly I hope to see from open source and AI moving forward. And obviously, for that to work, we have to have a definition of open source that we all agree on. Hopefully we’ll get there. But this has been great. Julia, I really appreciate you stopping by to talk to us.
Julia: Thank you. Thanks for having me.
Steve: Awesome. Thanks, everybody.