Thursday, April 07, 2011

Text Encoding and WCF

For the most part setting the encoding (UTF-8, UTF-16, etc.) on a WCF service is pretty simple:  set textEncoding on basicHttpBinding/wsHttpBinding or encoding on a textMessageEncoding in a customBinding.  It works in almost all cases, but there’s a surprising number of ways of indicating the encoding of message:
  • the content type of the message
  • an XML declaration at the start of the message
  • a BOM (byte order mark) that is the first 2 or 3 bytes (depending on encoding) of the message
Next an encoding of UTF-16 doesn’t specify a single encoding.  It could be either big endian or little endian.
We can glean a few rules from RFC 3023 - XML Media Types:
  1. The encoding in the content type should take precedence over any other encoding indications.  If the content type is UTF-8 but the XML declaration is UTF-16, the message should actually be UTF-8.  For example, a router could receive a UTF-16 message and reencode it as UTF-8 to save bandwidth.  The router would change the content type but would keep the actual text of the entire message unchanged.
  2. When the encoding is specified as UTF-16, there must be a BOM.
  3. When the encoding is specified as UTF-16LE or UTF-16BE, there must not be a BOM.
  4. An XML declaration is optional in the presence of other indications of encoding.
The TextMessageEncoder that ships with WCF is a little too strict in a couple of ways:
  • When encoding is specified as UTF-16, it requires an XML declaration.
  • The content type and the encoding specified in the XML declaration must be consistent.
This can lead to problems in interoperability.  Other web service clients can produce messages violating those WCF restrictions and the result is a 400 Bad Request with no additional information.  To work around this we will need a custom message encoder.
First we can start with the CustomTextMessageEncoder sample.  This sample allows you to specify any .NET supported encoding for an endpoint instead of just UTF-8 and UTF-16 but doesn’t do anything about the restrictions above.  Additionally it is necessary to specify which encoding you’ll use when deploying your endpoint.
The new text message encoder I’ll show will accept UTF-8 and UTF-16 encoded messages and workaround those restrictions.  I’ll focus on the message encoder since that is where the action happens, but there is additional boilerplate necessary.  You will also need a MessageEncoderFactory (called by the framework to create instances of the MessageEncoder), MessageEncodingBindingElement (used to plug this into the channel stack and allow the framework to create instances of the MessageEncoderFactory) and BindingElementExtensionElement (allows it to be used in config, this is optional).  The CustomTextMessageEncoder sample has excellent examples of each of these.
Let’s start with ReadMessage:

   1: public override Message ReadMessage(ArraySegment<byte> buffer, 
   2:                     BufferManager bufferManager, string contentType)
   3: {
   4:     this.contentType = contentType;
   5:     
   6:     byte[] msgContents = new byte[buffer.Count];
   7:     Array.Copy(buffer.Array, 
   8:                 buffer.Offset, msgContents, 0, msgContents.Length);
   9:     bufferManager.ReturnBuffer(buffer.Array);
  10:     // most interoperable to include the xml declaration
  11:     this.writerSettings.OmitXmlDeclaration = false;
  12:     // save the encoding for when we write the response
  13:     this.writerSettings.Encoding = GetEncoding(contentType, msgContents);
  14:  
  15:     Encoding xmlDeclEncoding = GetXmlDeclEncoding(
  16:                                 writerSettings.Encoding, msgContents);
  17:  
  18:     // xml declaration encoding doesn't match, need to reencode
  19:     if (xmlDeclEncoding != null && 
  20:         xmlDeclEncoding.WebName != this.writerSettings.Encoding.WebName)
  21:     {
  22:         msgContents = Encoding.Convert(
  23:                         this.writerSettings.Encoding, 
  24:                         xmlDeclEncoding, msgContents);
  25:     }
  26:  
  27:     MemoryStream stream = new MemoryStream(msgContents);
  28:     XmlReader reader = XmlReader.Create(stream);
  29:     return Message.CreateMessage(reader, maxSizeOfHeaders, MessageVersion);
  30: }