Juan Cruz Viotti on JSON Schema, the Invisible Infrastructure Powering APIs and LLMs

Get more video from Redmonk, Subscribe!

JSON Schema might be the most important technology you’ve never thought about. In this MonkCast, Rachel Stephens sits down with Juan Cruz Viotti, founder of SourceMeta and member of the JSON Schema Technical Steering Committee. They discuss how JSON Schema is the backbone of OpenAPI specs and just might be the language of AI.

SourceMeta
Juan Cruz Viotti: https://www.jviotti.com/

00:40 – What is JSON Schema?
03:00 – JSON Schema in OpenAPI and AI
05:30 – Challenges in API Ecosystem Management
07:00 – Siloed API Specs and Governance Issues
08:00 – Benefits of Schema Layer Governance
09:30 – JSON Schema as a Data Dictionary
11:00 – JSON Schema in AI and Code Generation
13:30 – SourceMeta’s Ecosystem and Tools
16:00 – Small Teams and Infrastructure Innovation
16:50 – AI, Documentation, and Data Semantics
17:30 – Vision of the Future

Transcript

(Rachel Stephens (00:04)
Hello everyone and welcome to RedMonk Conversations. My name is Rachel Stephens, and I am deeply excited today to have Juan Cruz Viotti with me. One of the best parts about my job is that I get to talk with people who are deeply passionate about their technology of choice. And when I first talked with Juan, I was so excited to have him come on and just share his excitement about a technology that I personally had not given a ton of thought to before. So we’re gonna dive a lot into JSON Schema today. But before that, Juan, can you please give us just a quick introduction to who are and what you’re doing.

(Juan Cruz Viotti (00:35)
Yeah, sure. Well, hello everybody and thank you again for having me, Rachel. And I’m Juan. I’m the founder of SourceMeta. It’s a company fully dedicated to JSON Schema. I’m a member of the JSON Schema Technical Steering Committee and that’s the organization that manages the standard. And I have a lot of experience using JSON Schema at scale and even at research when I was at the University of Oxford. So I have been very deep, I think, in the rabbit hole and I found some really cool stuff.

(Rachel Stephens (01:00)
Very cool, and we’re excited to dive into all of that. But before we do, like, when you first talked to me, I didn’t even realize that there was a JSON Schema Technical Steering Committee. Like, what is this committee? What do you do? Where do you meet in like a secret layer somewhere?

(Juan Cruz Viotti (01:14)
Yeah, so JSON Schema, it’s a standard schema language. It is used to describe the structure, the meaning, and the constraints of data, of information. And it’s one of those things that it is like so important, so pervasive everywhere in society. And I think we can get into touch into some of that later. While I still at the same time being quite unknown. And I think what you are expressing is what we tend to hear quite a lot is that that thing that it’s covering so many things but nobody knows because hopefully it’s because it just works. But yes, there is a JSON Schema Technical Steering Committee. We even have a yearly conference in Paris in December. So if you want to come hang out in there, we have an entire track of talks around JSON Schema. And we also have a pretty big online community as well as Slack channel, monthly open community working meetings, office hours, the whole thing.

(Rachel Stephens (01:43)
I’m always game for a trip to Paris, so mark me down for December. All right, but like as you mentioned, like this is a technology I think that has largely kind of flown under the radar for a lot of people that haven’t paid attention to it, but the technology is, I don’t know, roughly 20 years old at this point. Where are people, like where could they be seeing this technology if they knew to look for it?

(Juan Cruz Viotti (02:09)
Yeah, so if you’re working in the API space, it’s extremely hard to not come across JSON Schema, though, funnily enough, you might not even realize that you’re coming across JSON Schema. I think that’s the irony of all of these. If you’re working, for example, with OpenAPI, the OpenAPI specification, which is so widely used in the world, it builds on top of JSON Schema. So when you’re using an OpenAPI spec, when you’re writing an OpenAPI spec, and you’re defining the endpoints and you’re defining the HTTP headers and you’re defining the query parameters, you’re actually using JSON Schema within the OpenAPI specification. This is the same usually if you are using async API, similar API specification for the event driven world. And more prominently now though, if you using MCP, as a lot of people are actually doing now in the AI world, MCP, it’s also totally built on JSON Schema when you’re defining tools. on MCP, are actually defining them with JSON Schema. So it’s actually everywhere. And AI is one of the most interesting aspects of it. think AI is one of the major adopters of JSON Schema, turns out.

(Rachel Stephens (03:27)
Very interesting. And so a major driver of uses of the technology, but people maybe don’t realize that the technology is under the covers.

(Juan Cruz Viotti (03:41)
Yes, exactly. And I think that tends to happen with more like infrastructure tooling, right? And infrastructure languages in the world. It’s usually powering all of the cool things that we do in modern society, but we never actually realize it’s there. And I think JSON Schema it’s a perfect example of that.

(Rachel Stephens (03:56)
Yeah, so is it fair to call OpenAPI like a wrapper around JSON Schema or is it like an abstraction layer?

(Juan Cruz Viotti (04:01)
Yeah, and I think you’re touching on right thing. We like to think about it that way, is that OpenAPI, it’s a wrapper format that lets you use JSON Schema to describe APIs. So when you are defining your API spec, you are defining the endpoints using JSON Schema as a language, but then the actual content and the data, which is the important part of the API spec, you do it with JSON Schema. And interestingly enough, we’ve done an analysis at some point in SourceMeta. We analyzed a big data set of open source, OpenAPI specifications. And we found out that, I think, over like 70 % of the content of a big OpenAPI spec, it’s the JSON Schema aspect. So that’s what tends to dominate in any production use case, right? It’s like the ratio is pretty high.

(Rachel Stephens (04:52)
Gotcha. And so I think when we talked previously, your kind of general view of the world was that we as an industry are sometimes operating at not the correct layer. rather than adjusting the JSON Schema, we’re adjusting things in OpenAPI specs. Talk to me just about what you see as those, like the differences between those layers and what is kind of the structural thing that is preventing engineering organizations from having a schema first workflow.

(Juan Cruz Viotti (05:20)
Yeah, awesome. So and I think an example would make that even more clear. So take the average big enterprise. At that scale, or for example, your favorite public sector, government or whatever, most of them actually use JSON Schema quite a lot. These kind of like big institutions, they are going to be developing and managing a lot of APIs, like usually in the thousands, right? I think there was a study very recently that found out that like over 50 % of companies with 10,000 employees or more, have at least 1,000 APIs with a lot of endpoints to actually manage. So in that context, think about you might be developing so many APIs by totally different teams, potentially in different places and in different offices with different managers and different objectives. You have this entire ecosystem of APIs. What we see is that people try to apply to manage that potential chaos and guess tame the complexity of such a thing. They try to apply API governance and we’ve probably seen how much of a fuss API governance has been. But what we see with SourceMeta and the problem that we’re trying to solve is that when people approach API governance, they usually apply it at the OpenAPI level and get stuck there. They never go one level below. And what tends to happen in consequence is that you might have, again, 1,000 OpenAPI specs. In isolation, they are perfectly fine, right? But collectively, they are a bit of a mess.

Why? Because if we take for granted that most of an API spec is JSON Schema, you’re actually not governing at the right layer. So again, you might have 1000 OpenAPI specs, but your schema layer, which is supposed to be the substrate behind all of those, it still gets siloed at the OpenAPI level. So again, you get into issues where each API defines kind of the same thing over again in a slightly different ways, right? Then you pay the integration cost, the coordination cost.

(Rachel Stephens (07:12)
and

(Juan Cruz Viotti (07:15)
There is no unified single source of truth of what something means, for example, in the context of an organization. So you get all of those problems. They kind of get put in the shelf and ignored because people don’t think about usually the schema layer as a thing. However, that’s actually changing.

(Rachel Stephens (07:31)
Gotcha. so what would be the benefits of kind of having better governance at the schema layer? It sounds like it’s less duplication and what else? Like how should people be thinking of that?

(Juan Cruz Viotti (07:41)
And again,

I think the core of it is coordination. I think the amount of coordination costs that you pay for actually, again, operating in silos and not on a common shared vocabulary language is huge. In fact, I was reading one book recently, it’s called The Use of Knowledge in Society, and they were claiming that knowledge coordination is a main bottleneck in innovation in civilizations throughout history. And I think that’s the same kind of today and even getting even more explosive given AI.

So again, if you would have a single source of truth, for example, defining certain data models that you operate in your organization, then everything taps into that, whether that’s your OpenAPI specs, your MCPs, your databases, your datasets. If everybody actually talks about the same thing in the exact same way, everything that you build on top gets deduplicated. So again, it’s less work that you’re wasting for doing the same things over and over again, but then everything connects better.

And when they connect better, can get, I guess, benefits out of that information a lot more, right, with or without AI.

(Rachel Stephens (08:47)
Got it. And this, I feel like ages me perhaps, but when you talk about it that way, it makes me kind of think of like my ye olden days of somebody wanting a data dictionary at work. Like what is the difference between what a JSON Schema can provide versus kind of that exercise of single source of truth that we’ve written down in a document somewhere?

(Juan Cruz Viotti (09:08)
So I think JSON Schema has come to prominence as the most popular schema language out there. Again, like if you look at AI, if you look at, for example, APIs, even in governments, we have this study where we found out that all of the G7 governments were heavy users of JSON Schema. And I think, mean, so there are two parts to that question. One, think JSON Schema won, clearly. I think the schema war, if it was any, because it’s actually very expressive. It allows you not only to define a structure,

but to define advanced constraints and semantics on top of the data. So I think that’s what made it in a way like the technology that could satisfy that vision of a data dictionary somewhat at scale. I think, it’s the same idea. It’s just like the current expression of that. And maybe XML and XML Schema used to be a bit of that in the past.

(Rachel Stephens (09:57)
When you mentioned the word semantics there, it made me pause. I think one of the things that I have seen in conversations of late as ⁓ developers and engineers continue to adopt AI in the way that they are generating their code, they’ve had to pay less attention to specific semantics and more ⁓ attention just to their intent and their architecture and how they’re thinking about things.

I’m curious how you’re kind of thinking about JSON Schema as kind of a main driver for one of the layers of AI with MCP. And also at the same time, seeing developers kind of move into a different direction and how they’re thinking about semantics. How are you kind of just seeing that in your world?

(Juan Cruz Viotti (10:39)
Yeah, and there are two things to that. think even beyond the MCP, think people don’t realize how pervasive JSON Schema is. And as we call it, I think it’s like the language of AI, like the one and only. If you look, for example, at all of the major providers of AI in the world, all of them, when they ship, for example, structure outputs, which is an API for actually interacting with LLMs based on well-defined data structures, the only language that they actually support is JSON Schema. All of them, like literally all of the major LLM providers, it’s the one and only schema language that they support. So then as a consequence, a lot of the AI pipelines are all JSON Schema based almost by definition. And I think what that’s bringing from the AI point of view is that if we take that LLMs now are being used to write code, and I think they’re getting quite good at it.

You might argue that it’s not the same level of a senior engineer yet, but I think all of the arrows are pointing into that trend that AI is actually going to get better at writing code. I think that’s inevitable, whether the time scale is long or not. It’s a different thing. But assuming that writing code, writing functionality is getting commoditized in the world, that you can describe something and the AI would just do it and it will do it really well, that means writing code, it becomes the solved part of the problem.

What that means in turn is that the interface and the contract and the blueprint of what you’re trying to create, then that becomes the most important thing. So I think we could envision a future where we as software engineers, we might be purely operating at the blueprint layer, right? Like for example, using things like JSON Schema or MCP or OpenAPI to define really what we mean, have like sort of an executable blueprint of what we want to accomplish.

and then hand it over to the AI pressing a button and saying, hey, now generate me this. So I think if any, it sounds like the world is going more into this interface definition thing, which is where JSON Schema actually shines. And it’s already deeply embedded in the AI space. So think it’s a natural thing to occur.

(Rachel Stephens (12:45)
Very interesting. So talk to me about what you’re doing at SourceMeta. What is the company trying to tackle?

(Juan Cruz Viotti (12:52)
We are defining an entire ecosystem of both open source and commercial tooling for actually being able to use JSON Schema at scale. And what I mean at scale, for example, consider that Italy and France, for example, now they are inaugurating schema registries national wide, right? That’s actually a thing that they do by itself and they are publishing schemas and data models with JSON schemas as directly a public good. And to give you, for example, a sense of the scale, Italy, the API catalog of Italy has something like 14,000, I ⁓ think, actual APIs or like e-services as they manage it. So think about that massive amount of scale. When you’re trying to actually use JSON Schema at that level of scale, you’re going to reach into many blockers, right? Both in terms of tooling and for example, development tooling on developing all of these data models, how to actually govern them and so forth. So one of the flagship products that we have. It’s a JSON Schema registry that you can deploy internally or externally at your company or public sector, for example. And that’s the brain that knows about all of the schemas and the interconnections between them, offering APIs for discovering them, for connecting them into your open APIs or MCPs, and reason about these single source of truth in your organization.

(Rachel Stephens (14:10)
Very cool. Well, how big is your team?

(Juan Cruz Viotti (14:13)
It’s just me, pretty much. Yeah, it’s all small bootstrap kind of stuff. And I think that raises an interesting point. When people ask me, you’re approaching something quite big with something like that. And I think the model of a bootstrap small company, funnily enough, I believe it can yield better results at core infrastructure products, because you don’t have the wrong incentives of something like, for example, at VC involvement or for example growth or die, right? Or for example multiple people to actually coordinate on the vision and execution of something cohesive at scale. think interestingly enough this model which I’ve chosen by design, it seems to be better suited at actually building and pulling off these foundational technologies and I think there are plenty of examples in the world out there. I think it’s a, is what can get you a cohesive thing in the end.

(Rachel Stephens (15:07)
Agreed. And I think there’s plenty of times when that small by design is the correct approach. And I think we’re probably going to see that more and more as AI helps augment people’s ability to work on their own. So yes, very cool.

(Juan Cruz Viotti (15:21)
Yeah, exactly. I think, mean, maybe something interesting, I think, to touch on this is that, which I find very, very fun, is that I think the idea of representing data well, right? For example, the semantics of data or the structure of data and so forth. And for example, documentation also of your APIs is something that we, think, in the tech industry, we have been repeating for ages. I think everybody knows even before AI that they should probably need to have good documentation for the APIs. Or good, for example, API definitions. But still, think a large portion of the industry was not paying attention. It was always a secondary concern. And I think what made it pretty interesting is that with the rise of AI, now we might completely ignore the humans and the documentation for the humans. But when it comes to actually produce documentation so that AI can actually perform better, then we’re all in.

(Juan Cruz Viotti (16:14)
Right? It’s like we completely ignore the humans when we get the AI VIP treatment. And I think that’s giving like a massive trend on, now can we actually define our data in such a good way so that the AI can navigate everything completely transparently? And again, it was the same thing before, but now because it’s AI and not humans, think we seem to neglect human beings. I think that’s the lesson.

(Rachel Stephens (16:20)
I will not argue with you on that one. I think you have pointed out a very ironic truth in the way that we tend to talk, right? Well, Juan, what else would you like me to know? Any final thoughts that you’d like to share?

(Juan Cruz Viotti (16:52)
I mean, I would like to paint a vision of that future. What does it look like if you actually managing so many APIs with a very cohesive single source of truth? For example, imagine that you have an API catalog that lists all of your APIs in there, and they all tie together into the same data definitions, and you have consolidated that very well, and all of these data definitions are rich.

You don’t not only know the structure, but you know very specific constraints. You’re mapping to international standards, for example, when possible, and you’re augmenting all the operations that you do across APIs or MCPs with plenty of semantic information. It turns out when you achieve that, you can actually go and point an AI to your 3,000 API catalog and ask it to do something, and it will actually do it completely smoothly.

Well, actually understanding everything, navigating everything, knowing how to join information, right? How to go through many hops in that. And I think that’s something that it’s not the case at the moment. And people are realizing that and they are acting upon that. And again, many enterprises and public sector institutions are like the proof of that. And some are actually working with us at SourceMeta. So that’s kind of cool. think we going in that direction and finally unleashing that vision. but excited like that.

(Rachel Stephens (18:13)
Very cool. Well, Juan, thank you so much for carving time out of your day to talk with me. This has been so much fun and I love learning about fun nerdy things with people who are excited about them. So thank you.

(Juan Cruz Viotti (18:24)
Yeah, thank you very much and yeah, pleasure for being here.

Transcript

More in this series

Conversations (138)