Thursday, January 08, 2009

How to validate Unicode non-XML document instances containing the byte order mark (byte order marker or BOM) 0xFFFE with BizTalk 2000 or 2002

The documentation for both BizTalk 2000 (1) and BizTalk 2002 (2) clearly states that it is not possible: << Non-XML document instances saved in Unicode will not validate correctly unless you remove the byte order marker at the beginning of the file. >>

(1) http://msdn.microsoft.com/en-us/library/ms957204.aspx
(2) http://msdn.microsoft.com/en-us/library/ms865274.aspx

I was really pissed off by this [beeping] limitation, and a few spots already appeared on my face thinking about the custom piece of code a Microsoft consultant would propose to me in order to fix this [beeping] "bug". Googling on this limitation makes my spots even bigger!

Then, I decided to make some research myself to check if it is not possible to find a decent workaround, and... I am a GENIUS - I mean I find a decent workaround. By the way, as mentioned to my colleague, I will probably not be recognised as a genius as no one is using BizTalk 2000/2002 anymore ;-) Anyway, I decided to write this post, here is my solution!

When you parse a Unicode non-XML document instance containing the byte order mark 0xFFFE, the problem is that you get the Unicode character 0xFEFF - do not ask me why you do not get 0xFFFE, ask the Microsoft guys ;-) - as the first character of the value of your first field. And, there is an easy technique to remove it... make it the "Pad Character", I am a genius ;-) Here are more details:

  1. In BizTalk Editor, select your 1st field of your specification
  2. Click on the "Parse" tab
  3. Select "Right" for the "Justification" property
  4. Type 0xFEFF for the "Pad Character" property - When you click in the "Value" cell of the "Pad Character" property, the text "TAB (0x9 )" automatically appears, and then you have to carefully replace the "9" by "feff" and press "Enter"
  5. Save your specification

Your BizTalk Document Specification should now parse Unicode non-XML document instances containing the byte order mark 0xFFEE without any problem... you’re welcome ;-) By the way, your BizTalk Document Specification should continue to parse Unicode non-XML document instances without the byte order mark too!

No comments: