Tuesday, February 07, 2006

ASCII Control Characters and Web Service Strings

The Problem
We had an interesting situation come up recently with one of the web services we were writing. This web service returned a string value back to its caller, but occasionally the code Visual Studio .NET generated to handle the web service call would raise the following error message: “There is an error in XML document (1, 285).” The underlying error message was: System.Xml.XmlException: ' ', hexadecimal value 0x11, is an invalid character. Line 1, position 351. This caused us to do a little investigation and we discovered that the returned string would sometimes contain a 0x11 (17) control character. We determined that this character should not have been there in the first place, but we were surprised that it should cause an error. We did a little experiment and discovered that any ASCII character in the range of 0x00 to 0x1F (0 to 31), with the exception of 0x09 (TAB), 0x0A (LF), and 0x0D (CR), would cause this error. This error did not occur in the web service itself, but in the code that processed the XML response returned by the web service. This code is generated automatically by Visual Studio, so there is not much we can do to control it.

XML String Limitation
We did a little more digging and discovered that string values in XML documents are not allowed to contain characters in the range 0x00 – 0x1F (0 – 31), except for 0x09 (TAB), 0x0A (LF), and 0x0D (CR). Details about this restriction are documented in the following locations:
https://www.w3.org/TR/xmlschema-2/#string
https://www.w3.org/TR/xml/#dt-character

.NET Inconsistency
After discovering this restriction on XML string values, I was no longer surprised the code that parsed the XML response from the web service raised an error. However, I was surprised that the web service method did not raise an error when we tried to return a string containing an invalid character. I would have though that the web service method would not be allowed to return data that would be invalid when it was converted into XML.

Things to Remember
  • XML strings cannot contain characters in the range 0x00 – 0x1F (0 – 31), except for 0x09 (TAB), 0x0A (LF), and 0x0D (CR).
  • Since XML strings cannot contain ASCII control characters, web service methods cannot send or receive strings that contain ASCII control characters.
  • If you do need to send a string containing ASCII control characters, you can pass it as a Char or Byte array to get around this restriction.

No comments: