September 12, 2004

Where'd all the XML go?

Was anyone developing websites around the turn of the century, who recalls all the furvor over XML? with WML it could put content on your cell phones, your PDA's, with SVG it could put interactive vector content on your web pages. It's one format that any parser can read and write, friendly with all languages and character encodings, it can be parsed lightning fast with the beautiful Expat engine, so that applications can forever be seperated from presentation. So that data can be stored, transfered, and then presented in a fashion that is always easy to manipulate, and always has room for expansion.

So this weekend I had the time to try to work on a project I've been gunning at since 2001; making an XML driven templating engine.

I put together a prototype in PERL that employed selective SAX filtering (one filter multiplexing to other filters). My Parser is Expat. I'm using SAX2, subclassing off of XML::SAX::base, I've got all my i's dotted and my T's crossed. I'm using probably 10 lines of dead simple code that actually accomplishes anything — 1 SAX filter that looks for tags to decide what other SAX filter to pass things to, while the other puts parenthesis around text. My test xml file weighs 4 kilobytes, and is <P>Hello <B>world</B></P> repeated a lot.

Processing the file takes over one second.
Processing a 40k file also takes 10 seconds, so I know this is not the overhead of loading up the libraries or initializing memory structures. My project involves a daemon that does all of this, so I would be leaping for joy if it took ten minutes to load the libraries, so long as it could process the data at more than a hundred kilobytes per second on a standard 1Ghz machine. But no, I am using dead simple XML::SAX on an unloaded machine and it is processing data slower than a 33.6 modem.

I can't see much documentation about the matter, but is that why XML page delivery was such hot shit two or three years ago, and nobody talks about it anymore, and to this day people are still molesting XML with regular expressions?

Posted by jesse at September 12, 2004 04:17 PM
Comments

I'm completely lost by all that you said, which really just proves my idiocy.

Posted by: DeAnn at September 13, 2004 12:22 AM

Er.., no. :)

I blog about whatever I feel passionate about. At times, I feel passionate about the most horrificly involved technicalities you can imagine. I'ma computer programmer, so that happens :)

This blog entry is a fine example. I probably haven't gotten the hang of warning my audience of the nature of my post yet, or categorizing, or something, but folks like Jon Abernathy or maybe.. uh.. Tim Bray would know what I'm talking about here and could probably show me where I'd plugged something in backwards. But one of them has better things to do than my homework for me and the other one drafted the XML specification and probably doesn't even read my blog.

Posted by: Jesse Thompson at September 13, 2004 12:40 AM