Sometime today, at Mix, Microsoft is announcing project “Astoria.” While it may not be exactly pitched as this (I’ll have to see what the final announcement has), to me, this is Microsoft’s entry into the REST server and (more importantly) service area. As the new site says:
The goal of Microsoft Codename Astoria is to enable applications to expose data as a data service that can be consumed by web clients within a corporate network and across the internet. The data service is reachable over HTTP, and URIs are used to identify the various pieces of information available through the service. Interactions with the data service happens in terms of HTTP verbs such as GET, POST, PUT and DELETE, and the data exchanged in those interactions is represented in simple formats such as XML and JSON.
This is still all at the “project” status, so like one of the projects I find fascinating, Rational “Jazz,” don’t expect a delivered product anytime soon.
That said, the conversation and “playing around” with Astoria is possible. Long-time readers know that I’m a fan of releasing an alpha/beta/whatever early over big bangs. There’s a CTP available (early access to the code), a website, and running instance to play around with at http://astoria.mslivelabs.com/OnlineService.aspx.
Also, check out this post from one of the Astoria leads, the impressive Pablo Castro.
Background on the SDR
While I’m not at Mix (I had to decline the invitation for other travel: tragic!), earlier this month (10 April 2007), I had the pleasure with several other folks to sit in on a day-long pre-briefing (an SDR) on the topic with several other bloggers, consultants, and coders. I actually enjoyed this session quite a lot: it was much interactive, lively, and fun. Meeting the other non-Microsoft in attendance was great, and meeting the Microsoft tech leads, architects, community manager, and other folks around Astoria was great too.
As with the Blogger’s Corner at SAP, this kind of in-depth, get down and dirty stuff is just the right complement for the higher-level, strategy talk that I do in the more “big A” Analyst meetings. Don’t get me wrong, one side isn’t “better” than the other: but having both vantage points is key to a solid understanding of the given topic.
Enough context, on to the fun stuff.
When I say “REST server and service,” I mean software that provides for the storage, access, and searching over “resources,” or “entities” as Microsoft calls them. Putting it another way, it’s CRUD and the equivalent of “O/R Mapping” (like Hibernate) for a REST-mindseted design. The service implies that (a.) you access these operations over a network, and, more importantly, (b.) there’s a hosted, SaaS option (at http://astoria.mslivelabs.com). Now, point (b.) is just an experiment in the case of Astoria: they’re trying it out and seeing how it works. Think something like Amazon S3, but with semantics layered on-top.
Now, the funny thing about Astoria, as with most big vendor approaches to a technologies unfolding in The Wilds of the The Web, is that Astoria uses it’s own vocabulary for what we’d call REST or “public web SOA” terms. This is actually fine, as that field is in flux at the moment anyhow. Besides, locking down names and terminology is a slippery path to WS-Deathstar 2.0, this time starting with REST as a base.
The sort of core data-model for Astoria is the ADO.NET Entity framework. An “entity” in this world is, again, similar to the “object instance” you’d deal with in code once an O/R mapper had sucked out that data from your database and constructed the object model for it. The difference here is that entities are more “prime things” unto themselves rather than just instances of objects. That is, the hope is that they’re more than just serialized object graphs.
As you can see, an “entity” here is essentially what we’d call a “resource” in REST-land.
The other, exciting thing about Astoria is that it’s a URL-driven design. That is, entities and the methods of interacting with those entites can be represented as URLs. Each entity can be uniquely identified with a URL (sometimes more than one URL it seemed) and you can do the old “URL as command line” trick where you search via URLs.
Now, the Astoria team has been noodling on the exact syntax to do this identifying and searching. The examples we saw were quite similar to XPath, but by no means exactly the same. As you can imagine, there’s all sorts of horky things to stress out about when figuring out the syntax to use in URLs: there are encoding issues, characters already used, and many different understandings of what “feels right” when it comes to URLs.
As an example, here are some Astoria URLs:
http://localhost:50604/northwind.ashx/Customers/[City eq 'London']/Orders[Freight lt 1]– Searching for customers in London with less than one freight order.
http://localhost:50604/northwind.ashx/Customers[ALFI]– searching for the customer with the ID of
http://localhost:50604/northwind.ashx/Customers[ALFI]?$expands=Orders– searching for the customer with the ID
ALFI, and pulling back all the Orders for that customer
http://myserver/data.svc/Customers?$skip=30&$take=10– paging through results, from this paper
As the URLs imply, these entities are from the venerable Northwind database.
The URLs are used as identifiers for entites, but also for queries over the entities. As such, there are several “operations” you can pass through the URL:
$orderby– as in SQL
$take– just pull in some # of entities, as for paging through result sets
$keyset– just return the URL Id’s for the entities instead of all the data, for performance
$expand– “drill down” the data in the entity graph
$format– return either XML, JSON, or XML RDF
$callback– use a JSONP callback
Now, we had quite a lengthy discussion about all of this syntax. Like I said, there’s still discussion and decision to be had around it. Part of getting the conversation started up now is getting help honing this URL syntax. While there’s some of it that I don’t like, I always thought XPath seemed weird until I got to know it, and then I liked it. Granted, XPath statements are not URLs, but they’re sort of cousins to my loosey-goosey coders mind-set.
We had a lengthy discusion about using the
POST contents of an HTTP request to encode queries and other info. My take was: while this sort of breaks the ideas of REST and “disses” a purer URL approach, sometimes it will be a more pragmatic, if even elegant approach to a gorked up URL that makes less sense than old Amazon URLs. But, using
POST could well be a slippery slope into subverting the potential of the URL as command-line and identifier.
The important point here is that so much thought is being put into using URLs in the first place. And, yes, I realize I’m probably slinging “URL” around when I should be using “URI” sometimes. <shrug>
The point all of us at the SDR hammered home was the need to host this as a service on the public web. While there’s tremendous value in having Astoria behind the firewall, it’s sort of the very nature of this technology that it be hosted online as well. The whole point of Astoria for me is to be a thin, REST-y data-layer on-top of the cloud. The web is The Cloud, of course.
Not only is is “the whole point” of Astoria to be on the public web, but that angle would actually alleviate a significant part of the biggest problem for Astoria: it’s from Microsoft. The early adopters and promoters who could help sling-shot Asotira into wider success are not the biggest fans of Microsoft. They’re especially not the kind of people who want to run Microsoft middleware and databases (such as Astoria itself and the SQL Server instance required).
Hosting it on the web, where what’s running in the back-end (theoretically) doesn’t matter is a nice way around the deployment problems. Long-term, I’d also hope it would help keep the interoperable design as honest as possible: if traditionally non-Microsoft developers are using Astoria, the Astoria team will find the incompatibilities between the Microsoft and non-Microsoft worlds faster.
Data Security and Privacy
The other elephant in the room, long-term, is one of data security. This is the same issue that keeps some people from using Google. That is, the fear that the provider of a SaaS data-service will snoop around in your data and use it against you or, at least, to annoy you with ads and other “targeted” money-extraction schemes. Here, the road is difficult and tedious all around for everyone.
Ultimately, I’d like to see an approach where the data is somehow encrypted such that the provider couldn’t ever see the data. Really, that’s sort of impossible to garantee as the service needs to see the plain-text at some point. No matter what, you have to trust your SaaS provider and human-to-human trust isn’t something that can be coded away.
I could go on about the technical parts of Astoria. We had a great, day-long dive into it. But, Astoria is out there for you to go poke at and play around with.
If you’re interested in REST, SOA (hopefully, of the REST type), SaaS, or (though I shudder to type it) “the semantic web,” take a look at Astoria yourself. Like I said, it’s all at the “labs”/”project” stage now. But, that means there’s a chance to get in there and influence what the final release turns out to be. Does the interface and use model “work”? Do you like it?
If it weren’t for the hosting angle, I wouldn’t recommend people outside of the Microsoft ecosystem looking into it beyond the theoretical angle. But, since the Astoria team has been thinking and is experimenting with hosting it, I can see that it’d have wide applicability, esp. people making web applications, web/desktop hybrids, and other network enabled, SaaS software.
As my heavy SaaS angle implies, indeed, my prime advice for Microsoft here is to go at the hosted angle whole hog: just bite into that and hold on. That’s the lightening rail to grab onto for this kind of thing and what I feel the industry as a whole needs to move towards more aggressively.
Here, the primary issue beyond the interface and use model is scaling and performance. Going forward in a SaaSy world, services that can scale and perform are the highest value items that customers will pay for. “Anyone” can and will startup an HTTP server, do some crude resource/relational database mapping, and start providing data. What few people have the skill and capital to do, though, is make sure that stuff runs at an acceptable speed with near 100% uptime. The Astoria model seems far from “crude,” of course: the point is that dependability and “just works” is of incredible value, and ignoring those angles would detract from any other positive aspects of a project like Astoria.
Dare I invoke the “e-word”? Enterprise.
Disclaimer: parts of Microsoft are clients. Microsoft paid for my trip to the SDR, putting my a swanky hotel (it had a fire-place!). IBM is a client as well.