This project is read-only.
1

Resolved

CSharpCompilation.Emit() throws error

description

Whilst using the latest NuGet package of Roylsn (0.7.4091001-beta), I get some strange errors when emiting .pdb files in-memory.

Using the attached code, I run the following, which uses the MemoryStream() ctor where you pass in a byte[]:
CompileCode(script, "Test-Working", new MemoryStream(outputArray), new MemoryStream(pdbArray));
and get the following error
Emit in-memory Success: False
  error CS0041: Unexpected error writing debug information -- 'Exception from HRESULT: 0x806D000C'
peStream position = 0 (length = 102,400), pdbStream position = 1,024 (length = 102,400)
However if I run it with a HackedMemoryStream for the pdbStream only, it works?
CompileCode(script, "Test-Working", new MemoryStream(outputArray), new HackedMemoryStream(pdbArray));
Emit in-memory Success: True
peStream position = 2,560 (length = 102,400), pdbStream position = 2,560 (length = 0)
Finally, if I use call the MemoryStream default ctor, i.e. NOT passing in a byte[], it works fine as well:
CompileCode(script, "Test-Working-Default-Ctor", new MemoryStream(), new MemoryStream());
Emit in-memory Success: True
peStream position = 2,560 (length = 2,560), pdbStream position = 2,560 (length = 11,776)
In all cases, writing out the peStream/pdbStream to disk works fine, it's just to an in-memory Stream that's an issue.

HackedMemoryStream looks like this and a full repo is attached:
public class HackedMemoryStream : MemoryStream
{
    private readonly byte[] buffer;

    /// <summary>
    /// Using a regular MemoryStream causes Exceptions when emitting IL code.
    /// So this hacked-together class overrides the relevant methods to make it work.
    /// This was figured out by trial-and-error, I have no idea WHY these changes work???
    /// </summary>
    public HackedMemoryStream(byte[] buffer)
        : base(buffer)
    {
        this.buffer = buffer;
    }

    // We DELIBRATELY return 0 here, instead of base.Length
    public override long Length
    {
        get { return 0; }
    }

    // We DELIBRATELY DON'T call base.SetLength() here
    public override void SetLength(long value)
    {
    }
}

file attachments

comments

Zarat wrote Nov 4, 2014 at 2:26 PM

By passing a buffer to MemoryStream you cause Roslyn to think the stream is non-empty. I assume Roslyn tries to look into the stream, but loading it as a PDB fails (obvisouly) because a file full of binary zeros is not a valid PDB file.

Your "hacked" stream probably works because it reports the length as zero, thus Roslyn won't think the file contains data.

Possible fixes on your side:
  • call SetLength(0) on the MemoryStream before passing it to roslyn. This will notify both the MemoryStream and Roslyn that the stream contains no valid data.
  • Don't pass an array to the MemoryStream in the first place. This will also fix the problem that your buffer may be too small. If you pass your own buffer then MemoryStream can't grow if Roslyn tries to write more data than you provided capacity.
(IMHO you never should pass a backing buffer to MemoryStream if you don't know the target size beforehand. If you really need to provide a buffer you should roll your own Stream and support growing the buffer.)

Zarat wrote Nov 4, 2014 at 2:31 PM

Also note that if you pass the buffer to avoid double-allocating, that's not necessary. If you use the MemoryStream() or MemoryStream(int) constructor the backing buffer can grow and is publically accessible via GetBuffer. So you probably should use one of these constructors and use GetBuffer instead of providing your own buffer.

MattWarren wrote Nov 4, 2014 at 3:08 PM

@Zarat

Thanks for the info.
call SetLength(0) on the MemoryStream before passing it to roslyn. This will notify both the MemoryStream and Roslyn that the stream contains no valid data.
Yes you're right, I added that and it works okay now. Although I think it's a bit strange that when you call Compilation.Emit(..) it'll try and look inside the stream and behave differently if it's not empty, is it trying to append new .pdb into to existing data, I assumed that it would start from scratch each time?
Also note that if you pass the buffer to avoid double-allocating, that's not necessary. If you use the MemoryStream() or MemoryStream(int) constructor the backing buffer can grow and is publicly accessible via GetBuffer. So you probably should use one of these constructors and use GetBuffer instead of providing your own buffer.
I was passing in the byte[] because this is part of a Roslyn diagnostic I'm trying to write and I don't want it allocating a new array each time. I put the byte[] in a ThreadLocal storage and then pull it out each time, but I'm going to change that so that the MemoryStream goes in there instead. That way I'll also have to auto-growing that you talked about (which I'd not even considered). I thought it would be easier to cache the byte[] instead of the MemoryStream, but I guess not

angocke wrote Jan 6, 2015 at 9:07 PM

Fixed in changeset 27a12a39c9201890730825a6108e7229a79dc37f