Sunday, June 9, 2013

CSI: Visual Studio - Unable to translate Unicode character at index X to specified code page

CSI: Visual Studio - Unable to translate Unicode character at index X to specified code page:
A crazy internal error from Visual Studio
A customer emailed me a weird one. I tend to have a sense for when something is up and when an obscure thing will turn into something interesting.
The person says:
...mysteriously most of my projects refuse to build.  "The build stopped unexpectedly because of an internal failure... something about unicode... blah blah"
There are a few messages out there on the web about it -- even a really old hot fix.  What's the best way to proceed with the VS team / MS?  Is there anyone actively interested in glitches like this?
My spidey-sense is tingling. First, when something says "internal failure" it means some fundamental expectation wasn't met. Garbage in perhaps? He says "most of my projects" which implies it's not a specific project. There's also the sense that this is a "suddenly things stopped working" type thing. Presumably it worked before.
I say:
"Have you checked all the source files to make sure one isn't filled with Unicode nulls or something?"
And says no, but sends a call-stack (which is always nice when it's sent FIRST, but still):
Error    1    The build stopped unexpectedly because of an internal failure.

System.Text.EncoderFallbackException: Unable to translate Unicode character \uD97C at index 1321 to specified code page.

   at System.Text.EncoderExceptionFallbackBuffer.Fallback(Char charUnknown, Int32 index)

   at System.Text.EncoderFallbackBuffer.InternalFallback(Char ch, Char*& chars)

   at System.Text.UTF8Encoding.GetByteCount(Char* chars, Int32 count, EncoderNLS baseEncoder)

   at System.Text.UTF8Encoding.GetByteCount(String chars)

   at System.IO.BinaryWriter.Write(String value)

   at Microsoft.Build.BackEnd.NodePacketTranslator.NodePacketWriteTranslator.TranslateDictionary(Dictionary`2& dictionary, IEqualityComparer`1 comparer)

   at Microsoft.Build.Execution.BuildParameters.Microsoft.Build.BackEnd.INodePacketTranslatable.Translate(INodePacketTranslator translator)

   at Microsoft.Build.BackEnd.NodePacketTranslator.NodePacketWriteTranslator.Translate[T](T& value, NodePacketValueFactory`1 factory)

   at Microsoft.Build.BackEnd.NodeConfiguration.Translate(INodePacketTranslator translator)

   at Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.SendData(INodePacket packet)

   ...
OK, so it doesn't like a character. But a character in WHAT? Well, we'd assume a source file, but it's important to remember that there's other pieces of input to a compiler like path names, environment variables, commands passed to the compiler as switches, etc.
It says Index 1321 which seems pretty far into a string before it gets mad. I asked a few people inside and Sara Joiner says:
We aren’t doing substrings or anything – just transferring a dictionary, which involves writing first the size, then each key and value.  So if the data is bad, I don’t think it’s due to anything MSBuild has done to it.  That said, it looks like the only place in BuildParameters that we call TranslateDictionary is when transferring the state of the environment [variables] across the wire. 
Ah, so this is splitting up name-value pairs that are the environment variables! David Kean says "ask him what his PATH looks like." I ask and I get almost 2000 bytes of PATH! It's a HUGE path, it looks like it may even have been duplicated and appended to itself a few times.
Here's just a bit of the PATH in question. See anything?
\;C:\PROGRA~1\DISKEE~1\DISKEE~1\;C:\Program Files (x86)\Windows Kits\8.0\Windows

Performance Toolkit\;C:\Program Files\Microsoft SQL

Server\110\Tools\Binn\;C:\Program Files\Microsoft\Web Platform

Installer\;C:\Program Files\TortoiseSVN\binVN\???p??;C:\Program

Files\TortoiseSVN\bin;C:\PHP\;C:\progra~1\NVIDIA

Corporation\PhysX\Common;C:\progra~2\Common Files\Microsoft Shared\Windows

Live;C:\progra~1\Common Files\Microsoft Shared\Windows

Live;C:\q\w32;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;

C:\Windows\System32\WindowsPowerShell\v1.0\;C:\progra~2\WIDCOMM\Bluetooth

Software\;C:\progra~2\WIDCOMM\Bluetooth 
See those ??? marks? That doesn't feel like question marks to me. I open the result of "SET > env.txt" as a binary file in Visual Studio and it looks like it's 3Fs, which are ? marks.
I think the text file was converted to ANSI
This makes me think that there's unicode goo in the PATH that was converted to ANSI with it was piped. Phrased differently, this text file isn't reality.
However, elsewhere in the Windows UI his PATH variable looks like different.
C:\Program Files\TortoiseSVN\binVN\�侱ᤣp䥠؉;
Sometimes that corruption in the path looks like this and you might assume it's Chinese. No, it's corruption that's getting interpreted as Unicode. Interestingly the error said the naughty character was 0xD97C which is &#0xD97C; � which implies to me that something got stripped out at some point in processing and turned into the Unicode equivalent of 'uh...' Regardless, it's wrong and it needs to be removed.
I ask him if cleaning his PATH worked and the customer just send me a one line response via email...the best kind of response:
========== Build: 12 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
Yay! I hope this helps the next person who goes aGoogling for the answer and thought they were alone. Thanks to David Kean, Sara Joiner and Srinivas Nadimpalli for looking at the call stack and guessing at solutions with me!
Any insights, Dear Reader?


© 2013 Scott Hanselman. All rights reserved.



     

DIGITAL JUICE

No comments:

Post a Comment

Thank's!