A RedMonk Conversation: LLMs and Secrets Management

Get more video from Redmonk, Subscribe!

Join RedMonk’s James Governor and GitGuardian CEO Eric Fourrier for a discussion on the role of large language models (LLMs) in secrets management. DevOps processes and microservice architectures continue to drive application development towards a landscape that is more complex, distributed, and fragmented. This increases the prevalence of secrets such as database credentials, API keys, and tokens–which are used to connect all these pieces together–and increases the chance that such secrets may be inadvertently leaked, especially through code. This is where GitGuardian comes in, specializing in helping developers and their organizations with secrets detection and remediation at scale. This conversation focuses on potential advantages and challenges that the rise of generative AI technologies such as LLMs have introduced into the code security space and what this all means for the future of secrets detection and management.

This was a RedMonk video, sponsored by GitGuardian.

Rather listen to this conversation as a podcast?

Transcript

James Governor: Hi, this is James Governor. I’m the Co-founder of RedMonk, and I’m lucky to today, I have a great guest, Eric Fourrier, the CTO of GitGuardian. We’re here to talk about LLMs in secrets detection. Now, I know there’s been a lot of hype about LLMs, but here we are 2024, so we’ve had a year of what to do with it, 2023, and now we’re seeing some execution. So, yeah, I’m keen to have this conversation. Eric, welcome.

Eric Fourrier: Yeah, great to be here and talk about it James, I look forward to it.

James: Great! So I had a discussion with you beforehand, obviously, a little bit of preparation. And I think there’s some interesting things here because there’s been so much hype around the use of LLMs. I think for a software company, hype aside, it certainly creates some interesting challenges, some of them possibly even existential. One of the real questions for me is, or in fact, we’ve got two things. First of all, why don’t you say a bit about GitGuardian, what it is you do? Secrets detection, why is that important? Who are you as a company? What are your bona fides?

Eric: Yeah, so great question. GitGuardian is a code security company specializing, we’re really known for detecting secrets in code. We have actually the most installed security application on the GitHub marketplace with more than 200,000 installs. So it’s used by individual developers, small companies, mid-market companies, large companies all around the world. Definitely, I would say the past 10 years with the DevOps and microservice revolution, the number of interconnected applications has grown so much. You need definitely what we call secrets. It can be database credentials, API keys, tokens, so machines can talk together. We have assisted the path 10-15 years to what we call the secret support, this idea of the secret spreading everywhere in your infrastructure and especially in code. GitGuardian specializes in detecting, helping developers to not only find the secrets in code, but help companies really try to do the remediation at scale, meaning putting secrets out of code.

James: Okay, because obviously, we live in an era where software gets shared a lot, and it certainly gets cut and pasted, and secrets will find their way into production code, unfortunately, unless you have a decent solution to that. Okay, so that’s a good background. Let’s talk a bit about LLMs. When this technology emerged, obviously, as a company, you’ve got to think about, Well, what are the implications for us? What was your journey? And what have you learned about the implications of LLMs in code security as an application?

Eric: Yeah, so it’s a great question. I think as every software company, 2023 is definitely a mark, like the explosion of generative AI. Every company has seen and looked at the different application and implication for its own business. And cybersecurity is not an exception. And as a software company, you have to actually see this new technology and benchmark them and test them to see if they can improve your own piece of technology and improve the product for the customers. And that’s actually what we did at the first — as a code security and secrets detection company, especially GPT, when GPT-4, even the 3.5 was released, the first question we asked ourselves was, Can GPT-4, with a good prompt, outperform the sequence detection algorithm we just built in the last five years. And yeah, and a lot of companies, when you think about it, should ask the same questions and do the same testing about their own.

James: It is a scary one. Does this put us out of a job? That’s an existential question, as I said.

Eric: Exactly an existential question. And we come from… I’m an engineer myself. We have a strong engineering background, so it’s just what we do. Trying to find the best solution for the right problem. We did basically this testing of how trying in a really scientific way to compare ChatGPT against our own SQL Detection Module. And really was exactly the type of problem we’re trying to solve. So we are scanning the full real-time GitHub pipeline. We’re scanning everything in real-time. So just to give some order of magnitudes, we scan around 10 million documents per day on the public GitHub pipeline. So it’s an average of 100 documents per second. If you do small math and small approximation, the document is like 500 tokens. It gives you around the… Because the pricing of a GPT is based on the token. It’s a token price. It’s around three million tokens per minute that you have to scan. And so the first thing we want to see is like, okay, what’s going to be the cost? Because at the end of the day, you have to pay a certain price for GPT. So the main three, I would say, components of the benchmark are like, if we have to replace and use it as a technology, we have to quantify three things.

James: You don’t want it to cost more than the GDP of France.

Eric: Exactly. Is it expensive? Is it fast? Is it performing well? Meaning for our use case and even for any code security use case, when we think about performance, not only when I’m not talking about the speed of the algorithm, I’m talking really about what we call the recall and the precision. It’s really this ability for us to… The recall is the ability to not miss any secret, and the precision is trying to quantify the amount of false positives. So it’s usually the classic trade-off of a detection algorithm, not missing anything, but at the same time not raising too many false positives. So on the cost, I think, for us, it was a big realization that was pretty easy to calculate. It’s like with GPT-4, for us, the cost will be 200K per day, which was definitely impossible to maintain compared to our current architecture, which is maybe a couple of hundred bucks per day. So definitely on the cost side, it was definitely not doable. So you had alternative. You could use a GPT 3.5 in turbo, which was less performant, but also less costly. So for example, for GPT 3.5 turbo, the cost is way less than GPT 4 and it can go to 4 to 5K a day.

So the first issue was the cost. The second one was the speed. Is like these LLMs are a huge model. Everybody knows and it’s a big trend of OpenAI company. I will say a company building a large language model is the training part of the model is insanely long and costly, and you need tons of GPUs to do it. You can see right now a lot in the news, especially with Sam Altman and other people are looking to buy and even to build their own hardware to be able to train even more or larger models. But people also-

James: I think on valuations now, NVIDIA is beginning to look like a cloud hyperscaler. The value, certainly in the moment, yes. GPU markets go mad.

Eric: It’s insane. I think what’s really interesting is one of the first times, if you look at, I think there is this trend in the market where NVIDIA valuation will be higher than Google. For the first time in history, the valuation of a hardware company will be higher than a software company. Which is when you think about it, it’s really amazing compared to what happened in the last 20 years. The second thing I’ve been mentioning, the inference is still…we’re still talking about a huge model with billions of parameters. The inference time is pretty long, especially for a large file or a large document. Usually, the inference time is linear with the number of tokens. Yeah, the speed was not… Hopefully, our model was faster, cheaper, and also the last step for us was checking the performance. Even if we had infinite money and infinite time, would GPT-4 perform better than our actual sequence detection? And we realized on the benchmark, our model was still beating GPT-4, even if the performance was not bad, to be honest, but our model was still beating it. And sometimes a lot of, I think, people forget about the LLM is by design and in their own design, they are non-deterministic and stochastic model.

If you run them multiple times on the same input, the output can be different. It’s a tough one for any detection problem in the security space because as a company, for your customers, you don’t usually perform regular scans and periodic scans, and you don’t want two different scans of the same input data, so of the same code at two different points in time, will actually output different results for that.

James: Yeah, no, that’s not ideal when you’re trying to manage security.

Eric: Definitely. It’s a huge thing that’s really important, I think, to consider. Outside, I will say the fact a lot of people talk about hallucination, but people forget the fact that there is also the stochastic part of the models that need to be taken care of. That was, I would say, our first realization. But so really interesting for us. I think it’s super interesting to see that for us, and don’t get me wrong, I think the LLM and AI is a game changer for our business, secrets detection and code security in general. But it’s not the, I think, the most added value. It has also added value on the detection. It can help on some stuff, but we deeply believe for us, from a technological standpoint that on detection, it will always be a hybrid model with a first part that’s really fast, deterministic, and that performs really well. Really having the LLM models on top of it, for example, to solve different other use cases, which could be filtering false positives. And also the big advantage we see for the big use case, we see, and that’s, I think, true also for code security in general, not only for sequence detection, is on contextualization and helping people to look at the context around the code, around the secret, the different application, and be able to provide some tags, some context. So the security user or the developers can use it to better understand the issue and better do the remediation.

James: By the way, I thought… We’ll talk about the remediation in a second, but I thought the prompt itself that you were using was interesting because it’s worth talking a bit about that because the first line of the prompt, and we’ll share the prompt in the video, but I think it’s a classic, an absolutely classic first line of a prompt.

Eric: The first line of the prompt, as you said, it was just giving the outline and the context to GPT to do this work was like, you’re an expert in secret detection in source code. And you will receive a code snippet, and your role is to detect secrets in code. So really, we benchmark multiple prompts. And so that was usually the first.

James: You always have to tell it’s an expert, right? That’s the thing that I love about ChatGPT. It’s like, this is the way you need to think about it. You’re an expert in this. Don’t commit it like an amateur.

Eric: So yeah, I think the prompt is a huge… In any way, the prompt has always a huge influence on the quality of the result. And yeah, I think a good prompt is definitely a key to have great results. And that’s what we talk about right now in the industry about prompt engineering. I think you have even now job titles about prompt engineers. So yeah, definitely an area to master in the LLM and AI world, I would say.

James: Absolutely. To talk a bit then, as you said, about remediation. That’s one of the areas where… So it turns out that whilst code gen may be part of the problem, it’s also part of the solution. So for remediation, you see that’s a real opportunity for using LLMs in and around code security.

Eric: Yeah. So as I was saying on the detection, which I think the other part of, I would say the problem is, as you said, you have more and more code that is being generated by AI. So when you think about it, if you have an AI reviewing code generated by AI, it could create a bit of an infinite loop. And you can conceptually think, see, it’s a bit of an issue, and you will always need–

James: I don’t know how auditors are going to feel about that.

Eric: Exactly. So you will always need a part of some deterministic good rules to secure your code. You can think about it in a language as a grammar and stuff like that. You have a set of rules that you need to respect, to write and to talk, and it’s a bit the same for code security. So I don’t think this is going away. But definitely, I think on the remediation, it has been a big challenge we are solving at GitGuardian. And a big challenge for me in the security industry in general is you have more and more vulnerabilities, and a lot of companies are really focused on, Okay, I’m going to find every vulnerability for you. I’m going to find maybe, I don’t know, secrets, SAST issues, vulnerabilities, independents, the cloud security issues.

But at the end of the day, as a company, you know you have all of these problems. You just don’t want to know all your issues and problems. You just want to know how you can have a vendor that can help you fix them, actually. When you think about it for us on the secret detection part, our mission is not really only detecting secrets and having a great secret detection engine. It’s being able to, for our customers to help them remove all the secrets out of their code and IT system so they don’t get hacked. At the end of the day, it’s really the mission of our company, and I think the mission of every security company is not only on detection, but really trying to fix the issues. Where GPT is great is when you think, especially for a large enterprise, we find 10,000 and sometimes 100,000 of secrets. And the remediation, especially because we scan all the history of the code and you have companies that have written code for 10, 20 years. So it’s billions of lines of code to analyze thousands of applications.

So the remediation, we find a lot of issues, and you want to be able to automate the remediation. And to do that, you need really to be able to suggest code fixing to developers. And really, that’s where I think LLM will shine, and that’s their big strength because they have been trained on the whole open source code. They understand every language. So for them, it’s easy to solve the issue of like, okay, there is a secret or even a code security issue. What’s the correct way to do it? And just give some insights to the LLM saying, okay, as a company, I’m using, I don’t know, a CyberArk vault. Is this an example of how to use a CyberArk vault in a different language? This is a piece of code where I have a secret. How do I use correctly, write the correct code to use a sequence manager to use a sequence that’s not directly hard-coded, put it in the code. Definitely using things like that. That’s really where LLMs shine because you don’t want as a software company to build a deterministic model of rules engine with multiple rules per language and per framework.

It’s a lot of engineering time. Languages are always changing. You need a model that’s able to deeply understand the base of every language that is trained. It is really using all the code to understand all the syntax and not trying to implement it yourself.

James: No, absolutely. Okay. Well, I think that in terms of — a lot of the engineering approach you took to it, trying to understand the cost, trying to understand the performance, trying to understand the quality. You’ve got these benchmarks now. Obviously, that’s something, again, that we can share. But overall, look, I think everyone is very interested in the implications of LLM for their work. Code security, secrets detection, this is a great use case. So thank you very much for joining us, Eric. If anyone has any questions, stick them in the chat. I’ll share them back with GitGuardian. Obviously, that’s a solution that you should look at. As you say, it’s popular, it’s used on GitHub. Yeah, any questions, let us know. But don’t forget, if you got any comments, go ahead, share this with your friends, like, subscribe, all that good YouTube stuff. Thanks for joining us, Eric, and thanks all of you.

Eric: Thanks James for having me.

Rather listen to this conversation as a podcast?

Transcript

More in this series

Conversations (85)