A RedMonk Conversation: AI and Trust (Transparency and Security) with GitLab

Get more video from Redmonk, Subscribe!

AI is changing how software is written so it’s no surprise that GitLab is investing accordingly. In this conversation with chief product officer David DeSanto we examine the company’s strategies and approach to large language models (LLMs) and AI, with a particular focus on trust, security and transparency. GitLab sells to many regulated industry customers, which put a premium on responsible governance.

The company’s journey began with ModelOps to build, train, deploy, and version AI models alongside software. More recently GitLab has moved into the generative AI space by launching GitLab Duo, an AI platform, which touches all different facets of the software delivery workflow – for example automatically generating detailed issue reports. GitLab’s focus on bringing AI to self-hosted customers in air-gapped environments really underscores its commitment to privacy and transparency, and is a significant differentiator in a market which has been defined by the You Only Live Once (YOLO) just ship it approaches of the likes of OpenAI. We talk to DeSanto about GitLab’s pledge not to use customer intellectual property for model fine-tuning. We also discuss the company’s collaboration with Google Cloud on the security side of DevSecOps lifecycle. It’s a good show. Let us know what you think.

This was a RedMonk video, sponsored by GitLab.

Rather listen to this conversation as a podcast?

Transcript

James Governor: Hi, it’s James Governor, co-founder of RedMonk. And I’m super pleased to have somebody that on the show that I think is making a big difference in the industry. I think anybody in software delivery is familiar with their products and their platform. So I’d like to welcome David DeSanto, Chief Product Officer from GitLab today. Hey, David, how are you?

David DeSanto: I’m doing well, thanks for having me, James.

James: Great, great. Well, thank you for coming. And needless to say, it being, well, a year in — almost a year to the day since ChatGPT was launched, and everything changed in this industry. We realized we needed to get serious about AI and machine learning. I think at the beginning of the year, we were certainly at RedMonk seeing some firms leaping in with a little bit of AI washing. But we’ve had, you know, not a huge amount of time, but a year for everybody to assess what they can build, how they can build it, how they can improve their products. And pretty clearly GitLab is one of those firms. So where are we at the end of 2023, beginning of 2024, in terms of GitLab and what it’s doing with AI in its product sets?

David: Yeah, that’s a great question. So we’ve actually approached AI from two different angles. As you share at the beginning, GitLab is a DevSecOps platform… more than 50% of the Fortune 100 trust GitLab to secure their intellectual property. And so when we approached AI, we knew we needed to do two different things. The first was, and we started this in about 2020, 2021, adding model ops to GitLab to allow companies to build, train, deploy, rollback version AI models using GitLab, so they’re building their AI models next to the software that’s going to use it. And then in 2021, we started adding AI to GitLab to make it smarter and make it more powerful.

And so what we’ve done is we’ve launched early this year, GitLab Duo. It’s our AI powered suite of workflows within GitLab that help you with everything from getting through planning, coding, code review, security, deployment, and monitoring, all leveraging AI. And as of this point, coming at the end of the year, we’ve now launched 14 features in that suite, including Duo Chat, which can be your AI partner as you work through delivering software.

And so what we’re seeing is that companies are embracing both parts of that because they know to be competitive, they have to add AI to their products and those products abilities in GitLab. And then we’re able to help them be more effective leveraging AI. The way I look at it is, you know, we had a study done about two years ago at this point about GitLab ultimate and how it helps people deliver software faster. And out of that came a 7X boost in efficiency. There were seven times faster delivering software, but they also got their ROI in less than six months. And so with AI, we want to get that to 10X where our company is today at delivering software effectively and securely.

James: Okay, okay. Well, I mean, 10x is a lot.

David: It is a lot.

James: So we’ll get to that in a minute. But okay, so that’s a sort of a somewhat of a high level take. Let’s talk a bit about, I mean, I think there are a couple of things that interest me. Obviously GitLab is touching, you know, not just a bunch of different parts of the software delivery lifecycle, but different persona. And if we think about GitLab as application, put it this way, you’re not just saying, oh, hey, we’re another company that has fancy code completion. So it’s not just that kind of code assistant, you’re helping with some of those other roles. What are some of the functions that you thought you could help with in terms of making your customers more effective?

David: Yeah, so when we decided to add AI to GitLab, we really set ourselves three core tenets, and the first one is what you’re touching on. We wanted to help everyone involved in delivering software to be more efficient. And to do that, you can’t just focus on developer efficiency. Developers are core to GitLab, but in our annual DevSecOps survey this year, we found that only 25% of a company’s time and their SDLC is actually writing or patching software. The rest of the time is everything else that goes around that. And so what we did is we first focused on code review and that was helping people get through code review more effectively. The next thing we actually focused on was security because we found that —

James: By the way, before we move on, I’m just going to… Having looked at the code review functionality in automating that, making it easier, that was one of the things that in the demos I’ve seen and looking at your platform, that was one of the things that really came through as like that’s clearly going to provide some value there. Because what are we doing here? Code reviews are super important, but anything that can help us put them together more quickly and to summarize things more effectively for a broader range of folks, I think that’s one of the things that GitLab is very intentional about, internal communication. I thought that was a good piece of work.

David: Thank you. So I’ll tell you one of the things that get called out the most is that code review functionality. When people are talking about their experience with GitLab Duo, they talk about, I got through my code review faster. It did a better job summarizing the changes, helping me explain what I did, understanding people’s feedback and why I should consider it or make the adjustment.

But to your point, you talked about collaboration and that was actually one of the other things we added in is that, GitLab Duo cannot just help you in that development time, it can also help you in the planning component. I personally, Chief Product Officer, GitLab, get pinged on GitLab issues. Our issue tracker’s public. Our issues can get to, and this is not an exaggeration. I had one that was asked reviewed that when I went to PDF, it was over a hundred pages long.

And someone’s like, Hey, what are your thoughts on this conversation? Like, how am I going to do that as a human? Right. And so I can actually ask Duo, hey, summarize this issue. And it’ll summarize all those conversations. And what we’re seeing is that if you’re helping everyone, the product managers, product owners, helping the developers, the security team, operations team, companies are going to accelerate how fast they can deliver software. And to your point, I like to set ambitious goals for us. That’s, that’s why I chose 10X, but I truly believe if you apply AI in the right way and not just treat it as a single way to accelerate one part, you’re going to get a bigger boost because now you’re more effective in planning, coding, reviewing, securing, deploying, and so forth.

James: Okay. And I mean, that’s one of the things, because it’s not enough to just like write a bunch more software. I mean, that’s not going to take the state of the art forward. You know, writing a bunch of new applications, well, they need to be documented, they need to be managed. They need to be switched off. We need to have a better sense of like, you know, what’s the application kill switch? Because if it’s a situational application, that’s something we need to think about. I mean, you’re talking about 10x efficiency. That’s not going to come from just writing code 10 times faster.

David: Right. No, and actually the assertion that we made with this is that even if you took that 25% and made it a hundred times more effective, everything else around it’s going to break. Like your CI-CD pipelines are not going to be able to consume that. Your security teams are going to be overloaded. And you just, you can’t do that, right? So you have to actually spread that efficiency across the entire organization. And that’s why, by the way, I give that GitLab ultimate example.

Those are organizations that are choosing GitLab for their entire company. And they’re seeing that big of boost because it’s not just the development part. They’re accelerating everyone in that process. And now we just want to level that up another level.

James: Okay, okay. And let’s actually, let’s talk a bit about models because we don’t want to necessarily obsess about that. We’ve certainly seen some keynotes recently, which seemed almost as if they were just a long list of models. And it’s like, I’m not really sure how that helps anyone. But you’ve had to make some decisions about which models you are going to use to support your customers.

And indeed, I think your customers have some strong opinions there. It certainly some of them do folks in regulated industries, perhaps in geographies like Europe, where there’s a lot of consideration about privacy and security. I’d love if you can talk me a little bit through your decision-making in terms of which models you felt you could and should adopt and use in your products.

And also what is the value of open source here? Because I think as an industry, there seems to be a quite a high tolerance at the moment for calling things open that are not really open, whether that be company names or whether that be projects, how important is open source in machine learning and large language models? And how did that define your thinking and the choices that you’ve made in terms of delivering customer value?

David: Yeah, so it’s a great question. So to kind of get to the answer to your question, I’ll share the last two tenants we actually set because they fit directly into this. So that first one was you know across the entire software development life cycle. The next was be transparency and privacy first with how we apply AI in GitLab Duo. And the last one was best in class AI selecting the right model for that use case. And so when GitLab approached how to apply AI again through something like GitLab Duo, we said we could choose a single LLM and call that our hammer, but then every use case is going to look like a nail. And for us, that was not going to allow us to give the best user experience and give that acceleration.

And so one of the things I’m proud of is that we’re using many models to power GitLab Duo. We’re choosing ones that are great at code completion and code generation. We’re choosing ones that are better for chat, security, and so forth. And that’s allowing us to have a very unique approach to how we’re applying AI. And by the way, that includes open source models and models that we’ve written in-house that we, of course, open source once we write, because we’re an open core company. And so we see that that’s providing that efficiency. It’s reducing the confabulations or hallucinations that you hear about, because we’re not asking one really large, large language model. And to your point, that could be a proprietary model that you don’t have visibility into how it’s actually working and will it work for what you’re trying to do.

And so when you go back to the beginning where I shared about the, you know, over 50% of the Fortune 100 trust GitLab to secure the intellectual property. That’s where that privacy and transparency comes into your question as well. So the first part of that is that, you know, we’re transparent about what models we’re using. You can go to our doc site, docs.gitlab.com and the left navigation is GitLab Duo. You can click on that and it’s going to list every model we’re using. It’s not only going to list the models so you know what we’re using, but it also tells you how they were trained.

And so we’re holding our partners accountable to share that information. So we can then share with our customers that this was trained in a way that they feel comfortable adopting it. And I think that’s kind of how GitLab brings it all back together to your question. It’s really about doing the right thing for that use case in a way that your customers can understand it and trust it and thus by extension, be able to trust you and get that acceleration that AI can provide.

James: Okay, okay. I mean, I’ve been talking about like an AI bill of materials, which is like a software bill of materials, but for these models. And we’re not there yet, certainly not, automated or whatever, but I think one of the first things you need is some transparency, and certainly to know what the models are trained on. So I think anything that you can do there is gonna make customers more comfortable.

And what about not having leakage? We don’t want to, you know, the famous Samsung example where they were using, well, open AI. And then it turned out that, some of their trade secrets were leaked because that became part of the model. Your customers don’t really want that to happen now, do they?

David: No, that’s correct. And so part of that privacy is that we’re not using their intellectual property to train or fine tune the models we’re using. And I think that for GitLab is a big differentiator. We are an open core company, we’re source available, but companies who have intellectual property they don’t want leaked on the internet, come and use GitLab. And so I think it’s always been a very interesting story to have the most transparent publicly traded company with source available, be trusted with people’s intellectual property they want private.

And so to your point, that’s part of that documentation. If we are using anything that’s part of your code or our knowledge of your project, that’s only going to be used on a model that’s only for you. So like our suggested reviewers feature, you’re running your own inference. That way we can actually train it on your merge requests and your commits so we can find you the right code reviewer, but that’s now unique to you. That’s not a model that’s shared. Like our code suggestions model. We’re not taking the prompts, the output storing them and using the fine tune. And that’s what’s happening when you see that leakage. It’s because someone’s personal information or a company’s intellectual property is being used to train, fine tune the model. And there’s always that argument over, is the output yours or is it the company who generated it, right? And in our case, that output is yours. And so it now becomes part of your intellectual property and thus we’re not gonna store it and use it for fine tuning.

James: Okay. So I think one of the questions that I’ve got is, I guess about partners and models that are out there. You as an organization partner pretty closely with Google. You did some interesting, DevOps stroke security stuff with them this year. That’s a partnership that seems to be bearing fruit. And you in this large language model space, they were one of the partners that you’re using.

I mean, how confident are you with them as your sort of chosen partner. We noticed yesterday, or I shouldn’t be too specific about dates because people will be watching this, but Google has recently launched Gemini. How confident are you that they’re going to be the right or one of the right partners in terms of, yeah, certainly the kinds of functionality in and around generating code and the reviews and so on that you’ve been mentioning.

David: Yeah. So we feel very confident in our partnership with Google cloud and same thing with our partnership with Anthropic. We see them as organizations that meet those core tenants we have around privacy and transparency. And in the case of Google cloud, we’ve been a partner of theirs even before the AI boom that’s happened over the last 12 months. And so that relationship is rooted in all of that trust we’ve had over the years. And this is just a way to extend that.

I’m very proud that we work with industry leaders like Google Cloud Anthropic, who talk about privacy and transparency and the importance of that. And I always look at it as like GitLab is a leader and always the leader in the DevSecOps space. And we’re influencing large companies like Google Cloud to make sure that they’re applying AI in a way that’s safe and secure. And in the case of things like Gemini, we’ve not been quiet about this. We were able to launch things like our Explain this vulnerability early in the year, because we get early access to those models and then we help give them feedback and validation to make sure that model is going to do exactly what they’re saying it’s going to do. But yeah, they, part of that partnership is they don’t have access to that, that same data, and they’re not storing it and retaining it.

James: Okay, so, but if Google gets better, then you get better. Or if Anthropic gets better, then you get better.

David: And the other way is true too, right? As GitLab becomes more effective at applying AI, it’s making their AI solutions better. And that’s why we enjoy having that two-way partnership with both companies.

James: Okay, okay. So yeah, I mean, I guess just to finish up a bit on that sort of privacy and security angle and indeed data sovereignty, what sorts of organizations are most concerned about this? Does this reinforce your existing customer base? I mean, you mentioned the Fortune 100. Who’s the most concerned about AI? Who are the people that are most concerned also using it? What insights can you give about your customers and their interest in taking advantage of this tooling or their fear of it?

David: You’re spot on James. Our customers happen to be heavily regulated. They’re in healthcare, fintech, financial service and so forth, or their government agencies are one of the most popular solutions for governments around the world to use. And the first thing they ask is, how are you handling my data? Are you being privacy forward with AI? Because they want to use AI, but they’re seeing the stuff you’re referencing about articles about companies like Samsung having their intellectual property leak online.

And so when you look at how to apply AI in those industries, you’ve got to think okay, I’m handling patient data potentially. I’m handling government secrets. I’m handling source code that could potentially be used to make a weapon or take down an agent state, right? And so you can’t have that stuff be not handled with the level of privacy it needs. And so today we’re very proud of the fact that we’re, you know, privacy-centric. We’ve been for a very long time. And that is allowing our customers to see not only our documentation, but they can look at our source code and see what we’re actually doing. And between those two combinations, it’s allowed them to actually begin to adopt AI.

One of our more outspoken financial services customers, Nat West, shared a stat with us and a quote that we shared on our earnings call around how they’re seeing AI with GitLab be applied in a way that’s accelerating them as an organization and even their best developers and best team members, the more senior are even getting a big boost using AI. And so that’s an example of a FinTech customer or financial services who has to be careful with what they’re doing, but they’re able to adopt AI because of how GitLab is approaching it.

The last thing I’ll share with you too is that our goal, we’re a big self-hosted company. That means that the majority of revenue is coming from customers who self-host GitLab. We’re also focused next year on how to bring AI to them in their air gapped environments. And that’ll be another way GitLab can differentiate on our AI approach. Cause now not only are we being privacy and transparency forward, we’re also giving it you the ability to run it yourself in your own environment. So that way you don’t have to worry about poking a hole in your air gapped data center where you’re running GitLab to be able to use AI.

James: That sounds like a fascinating conversation. I look forward to talking with you further. Cause one of the things about these models is it’s amazing that they can run locally. I know that we have a lot of talk about like GPU chasing and all these sorts of issues, but by the same token some of the newer open source models you can run them on a phone. And there’s gonna be a lot of really interesting opportunities. So yeah, it’ll be interesting to see how that plays out.

David: As a side note, just to give you an example of what you’re touching on, some of our models are that small too. So like the suggested reviewer model can actually run locally. So we’re figuring out how to package that up and deliver that. It doesn’t require GPUs. And so that’s why we have that AI team internally, you know, and why we leverage open source models, we’re also trying to make sure we can do something that is portable, not just for ourselves, but for our customers.

James: That reminds me, tell me a bit, you, cause you’ve got like a, a model assessment team, right? And given sort of GitLab and their general approaches to transparency, one assumes that this would be valuable for third parties because if you’ve got a model assessment team. Is that right? I mean, I know it’s right. They told me about, about that.

David: We do. Yeah, we do. Yeah, so the AI model validation, their direction page, everything with GitLab roadmap is public. And so the direction page is public. You can read what they’re doing, what they’re looking at. But they’ve built, I think it’s like 65,000 prompts for testing models. And so we’re testing each model that we’re going to adopt and see how it’s doing. And then we’re providing that feedback back to the open source community, making the improvements ourselves, seeing it back to the partners we were talking about earlier. So that way we’re getting a better model.

The thing that we’re trying to figure out is, you know, how do we begin to make that even more transparent? And so the team’s looking at that for next year on how we could actually begin to share all the metrics and how we’re doing that. But all those prompts are available today. All of the outputs from the tests are available in their, in their project on GitLab.com. And we’re just trying to figure out how to continue to be that transparency first company.

James: Awesome. So I guess the final question, you know, we at RedMonk, we do a little top down, but we try and focus as much as we can on the practitioner, on the developer, on the DevOps folks. What’s your call to action? Like what do you think that developers should be doing in terms of their career in terms of where they are now, either from a GitLab specific point of view, or more broadly, like, you know, there are some people that are worried about the future of their jobs, frankly. There are others that are just very, very comfortable and happy with where we are. What should folks be doing in order to thrive in this new era?

David: Yeah, the one thing that I like to share with people and one thing that I end up getting is pings on like LinkedIn saying like, Hey, I have this question about my career. Like, what are your thoughts? And one of the things is, do I now have to become an expert in AI based on what’s happening? Will my job be replaced? Like what, what should I be doing to make sure I have a job in five years? And my feedback’s always very similar in that, you know, two years ago when GitLab started doing AI, we acquired an AI company to start our AI journey on GitLab Duo. We bought a company named Unreview.

That became our suggested reviewers feature and that team is still here today. And two years ago, that was the case. Like to become AI savvy, you had to know a lot about AI. Today, that’s actually changed. Not only is it the explosion in the last 10 to 12 months, but it’s also the accessibility of it. And so what I tell people is that the best thing you can do is get really good at prompt engineering. You may not have a job where you have to do that on a day-to-day basis, but you may have run into a situation where you say, hey team, like I got this idea. I think it might make us more effective. And here’s this thing I did. And between, as you said, open source models, projects like Olamo, or you can run things on your laptop, you can actually proof of concept something and come back and say like, Hey, I’m using, I don’t know, say, the new Phi model from Microsoft, and I was able to accelerate this thing in our application. Should we go forward with it? And I think that’s really the power of it, really understanding that it’s a different paradigm. And that’s where the world’s going is that I think there’s gonna be a lot more prompt engineering than there is like classical engineering development over the coming years and get lots of engineering is even structured that way too.

James: Well it’s a long time since I was writing any code, but I have generated some code and it actually works. So, you know, the machine works, that’s good news. Anyway, here we go. I really appreciate your time today. That was a great roundup. Looking forward to hearing more about what and how you deliver in terms of running models locally. I’m pretty sure for your customers, that’s going to continue to be sort of the sorts of things they’re looking for.

I’d just like to say a great RedMonk conversation. Thank you so much, David. Thanks for joining us and everyone out there, I hope you enjoy the conversation. Leave some comments, subscribe, you know, join in the RedMonk conversation. There’s a lot of good videos to watch and we’d love to have you as part of our community. So get on board. Thanks everyone. Thanks again, David. And, uh, that’s a wrap.

David: Thank you for having me.

Rather listen to this conversation as a podcast?

Transcript

More in this series

Conversations (76)