Tuesday, February 22, 2005

An old XML friend

A coworker of mine noted that we had this bug: our system would turn filenames with multiple embedded spaces in them into a single space. The culprit? XML. We pass the filenames to a backend server via XML, and XML is not space-preserving for ordinary parsed character data (PCDATA) inside of elements (between start and end tags). There are three solutions:

  1. Escape the spaces
  2. Use CDATA
  3. Write a space-preserving DTD

Of these, I prefer the second, using CDATA. It is the simplest to implement and makes the most readable XML as one does not need to mentally translate the escaped characters. However, it does require that the receiving XML parser understand more than just tags and attributes (and some hand-writter parsers do not, in fact, do any more than just this). The first, escaping the spaces, is probably the most portable but requires work in properly escaping anything in the data that might cause trouble.

The thrid, writing a space-preserving DTD, is the most intersting. In fact, it is probably the most correct solution of all from the perspective of elegance and clarity, but requires the most support from the receiving end. Caveat emptor.

Post a Comment