Don Box's Spoutlet

This year was considerably harder than last year. Last week, I was heads-down coding every day on new stuff. Couple the two-day hiccup in my work with the fact that the weather in Seattle has been convertible-friendly, and it took a lot of will to get on the plane and go to Texas.

2) Yasser and I did a short one-hour Q&A session as part of a larger pre-conference event.

3) I hosted a panel with these guys. No Sells, but Yasser (the latest "legend") was in attendance. One thing was clear from #2 and #3 was that people want deep XML support from VS.NET - it's nice to know I'm not alone.

1) I gave a broad WS talk in the arena. This was the big talk for me, as it was in the big room and there are expectations. I did most of the talk in raw XML, toggling between IE, EMACS, and Office 2003. I'll post the main message of the talk later.

2) I hosted a small panel of WS friends (SteveSw, YasserS, and Clemens). The highlight was easily Steve's answer to the "is COM dead" question. To paraphrase Steve: "Like humans, technologies stop growing in size as they age. Also like humans, once a technology reaches the age where growth stops, the culture cares less about them - they're somehow less interesting."

I also had the treat of seeing fellow blogger Fumiaki Yoshimatsu (a.k.a., Centaur's Identity). In case you hadn't noticed, his blog moved recently. Re-subscribed.

I had forgotten how hard it is to do pre-recorded talks. Hopefully the next one will be easier.

This may have already made the rounds, but it really captures (for me) why I stopped reading lots of weblogs.

Chris Brumme's has been nice, however. I especially love the notion of marshal-by-bleed.

I've been spelunking through URLMON land lately. Pretty interesting stuff (we'll see how well COM interop and IE get along next week).

If you are an IE user, type "view-source:http://www.gotdotnet.com/team/dbox/default.aspx" into the address bar to see my site's HTML representation.

Sam just launched a WIKI to capture what exactly we should be slinging around in the N-way web.

It really drives home how complex XML development can be once you try to adopt the appropriate standard for the task at hand.

I recently quipped during a keynote that XML in its entirety had become more complex than COM ever was. Even though by the end of the 1990's pretty much any API Microsoft produced was COM-based, there was a very tiny kernel of interfaces that you could build your own personal empire with while safely ignoring the rest. Here's the list as I remember it:

IUnknown	Needed to make object references work.
IClassFactory	Needed to give the loader a hook into your code for object creation.
IDispatch	Needed to integrate with scripting/late-binding.
IMarshal	Needed to play games with parameter marshaling.
IMoniker	Needed to play games with object naming (e.g., VB's GetObject).

Note that of these five interfaces, only the first two were strictly necessary (although IDispatch was damn near mandatory given classic ASP).

Even if you take the transitive closure over these interfaces (e.g., which picks up the system-implemented interfaces IBindCtx, ITypeInfo, ITypeLib, IStream), the list of interfaces you were expected to implement was actually quite small. Where COM got nasty for most developers was in the details - because COM sat outside of your programming environment, you had to do a non-trivial amount of manual labor, all of which was extremely low-level (either memory management or thread management).

And so now, we have XML. Here's my rough pass at the "kernel" I couldn't imagine living without (and why).

URI	XML relies on the URI value space for identifiers/references/queries. It's the lone hard dependency on the old-world web (OWW). It's also the bridge from the old world of centralized protocol standardization (try to register a new scheme with IANA) to the new world of ad hoc decentralized authorities (that DNS name you squated on in the 1990's makes you a first-class banana republic).
UTF-8	XML did a great job of delegating the octet->character mapping to Unicode which allowed XML to work exclusively in terms of abstract Unicode code points.
XML 1.0 + Namespaces in XML	These two specs are pretty much indivisible at this point - between the two of them we get a syntax over Unicode character sequences that gets people out of the lexx business.
XML Schema Part II	Reaching agreement on type definitions is incredibly hard (and often impossible). XSD Part II gives the world a set of common datatypes that, despite its warts, people seem to feel is good enough. To paraphrase KeithBa, if God wanted to write down an int in Unicode, God would use the rules from XSD Part II.
SOAP/1.1 Section 4 or SOAP/1.2 Part I Sections 2 and 5.	Strip away the idiosyncratic encoding rules of SOAP/1.1 Section 5 and the quasi-formalisms of SOAP/1.2 Sections 3 and 4 and you get a pretty nice data model for augmenting an XML element with additional XML information that has a fighting chance of not melting down in the face of intermediaries.

XML Base

The dependency on URI brings with it the use of relative URI. Relative URI are nasty in that you can't interpret them without context, specifically, what base URI they are relative to. XML Base tells you how to figure this out (and control it) - these rules are picked up by the xs:anyURI type from XML Schema Part II.

OK, so that is the good news. The bad news is that XML Base was added after XML 1.0 was out the door, which means some legacy technologies don't handle it correctly (even Namespaces in XML left this undefined, which resulted in quite the brouhaha over Microsoft's use of relative URI as namespace identifiers).

Another relative URI nit is that because context is needed, when you sign an XML fragment that contains a relative URI, you aren't actually signing the intended value but instead are signing only the relative portion. This means that when you use XML Exclusive C14N to calculate digests, you aren't necessarily getting the whole story (in fact, this is called out in the spec as a known characteristic).

XML Information Set (Infoset)

There are actually two dependencies on the Infoset. XML Schema Part II relies on it to provide the [Base URI] property needed for xs:anyURI and the [in-scope namespaces] property for xs:QName. SOAP/1.2 relies on the Infoset for any number of reasons.

I deliberately did not include these two specs in my "kernel." Specifically, the Infoset became much less useful towards the end of its development by adding things like [encoding] and [prefix] that are mired in XML 1.0 details. My favorite draft is this one from 2000. Honestly, if I had to pick a standards-based data model to build on, I'd prefer the XPath 1.0 data model over the Infoset any day. That's one reason (amongst many) why I prefer XPathNavigator in the .NET XML stack over any other API out there.

I also did not include any number of other technologies, including XPath, XSLT, XML Schema Part I, XML Query, SOAP/1.2 Part II, RDF, Relax NG, HTTP, HTML, XHTML, RSS, XInclude, XPointer, XLink, WSDL, UDDI, ebXML, WS-*, etc.

That isn't meant to imply that any of these technologies are poor quality or useless (although there are examples of both in that list). It simply means that I think you could get a hell of a lot of work done with excellent implementations of the five specs from the kernel and not much else - that's why I think of them as a "kernel."

The Echo folks are trying to sort out a way to allow real and escaped markup to coexist. I think many of us realized that you cannot abandon escaped markup given the massive amount of classic HTML content out there. You also need to make sure that new apps don't have to dumb-down their content and trigger a second pass over the data just to pick up the markup.

I share the concern of Sam and others that the selection of real vs. escaped be based solely on the MIME type. This makes it difficult for schema languages like Relax NG or XSD to property process and/or validate the data.

and that any escaped markup (such as HTML) appear in a pre-defined XML element whose type is xsd:string. The use of this element would indicating that the string value of the element is in fact escaped markup.

Honestly, given that there are no other commonly deployed markup languages that aren't based on XML, I'd be comfortable/happy calling that element <echo:rawhtml>.

<entry xmlns="uri/of/echo/namespace#" >
 <title>My First Entry</title>
 <author>
 <name>Bob B. Bobbington</name>
 <homepage>http://bob.name/</homepage>
 <weblog>http://bob.blog/</weblog>
 </author>
 <link>http://bob.blog/28</link>
 <id>http://bob.blog/28</id>

 <created>2003-02-05T12:29:29Z</created>
 <issued>2003-02-05T12:29:29Z</issued>
 <modified>2003-02-05T12:29:29Z</modified>

 <content type="application/xhtml+xml" xml:lang="en-us">
 Hello, weblog world! 2 < 4!
 </content>

 <content type="text/plain" xml:lang="en-us" >
 <![CDATA[ Hello, weblog world! 2 < 4 ]]>
 </content>

 <content type="text/html" xml:lang="en-us">
 <rawhtml><![CDATA[ Hello, weblog world! 2 < 4! ]]> </rawhtml>
 </content>

 <content type="image/png" xml:lang="en-us" href="http://bob.blog/helloworld.png" />
 </entry>

A friend just called me at home to ask if I've been tracking the PIE work that Sam Ruby kicked off. Honestly, I've been so heads down working on getting our product milestone off the ground that I've been completely absent from the blogosphere and will likely stay that way for the next two months or so.

While I'm sorry I haven't tracked what's happening, I must say I'm impressed with the Echo format that Sam and his cast of thousands have converged upon. I think

I hope that Sam can reign in the unbounded design churn that Wikis can foster and start making hard choices so that the world move on to building apps and stop arguing about bad use of XML or insanely insane personality wars.

Sep	OCT	Nov
	03
2002	2003	2004

Don Box's Spoutlet

June, 2003

Done in Dallas

MSDN TV

Wow, this nails why I've stopped reading most weblogs

Jan Gray's Performance Piece is on MSDN

Cool IE feature

A data model for log entries

If your Funky your Valid, clap your hands

XML eclipses COM

Escaped vs. Unescaped Markup

Echo

I started a new book today

Recent rants