Anonymous Iterators, Yielding in a Try...Catch Block

Topics: C# Language Design
Apr 22, 2014 at 6:36 PM
Currently C# does not permit anonymous methods to be iterators, or for iterators to yield a value within a try block that has any catch blocks. When iterators were finally implemented in VB.NET both of those restrictions were removed. For the sake of language feature parity I am curious as to whether or not those restrictions could also be removed for C#.

Anonymous Iterators
Func<IEnumerable<int>> numbers = () => {
    yield return 1;
    yield return 2;
    yield return 3;
}
Yielding in a Try...Catch Block
try
{
    yield return 1;
    yield return 2;
    yield return 3;
}
catch (Exception exception)
{
    // yield not legal here in C# or VB.NET
}
Honestly not something I can say that I've really needed, but I thought it was worth having a discussion.
Apr 22, 2014 at 7:19 PM
I have actually wanted to use anonymous iterators before, though the need is admittedly rare. If the C# team were to revisit iterators, something I'd be more interested in is a way to have an eagerly evaluated block at the beginning of an iterator method. Something like:
public IEnumerable<string> Foo(string bar)
{
     eager 
     {
          if(bar == null) throw new ArgumentNullException();
     }

     yield return bar;
}
This would eliminate a huge maintenance burden with iterators, namely that current C# design guidelines encourage you to write two methods; one to check the arguments and another that is the actual body of the iterator:
//Writing iterators like this is really annoying!!!
public IEnumerable<string> Foo(string bar)
{
    if(bar == null) throw new ArgumentNullException();

    return FooIterator(bar);
}

private IEnumerable<string> FooIterator(string bar)
{
    yield return bar;
}
May 13, 2014 at 10:30 PM
Edited May 13, 2014 at 11:25 PM
It was my understanding that anonymous iterators were not allowed mostly because it was too complex to implement (see Eric Lippert's post on the subject). But since Roslyn is a complete rewrite of the compiler, and VB manages to support it, I was hoping this limitation would be removed...

To be frank, I don't think I ever really needed anonymous iterators, but the no yield in try block limitation is quite annoying. It would be nice to have parity with VB.
May 13, 2014 at 11:19 PM
Edited May 13, 2014 at 11:20 PM
It would also change meaning of existing C# code based on the existing iterator model.
Because uses a different implementation technique to Visual Basic.
May 13, 2014 at 11:20 PM
The normal expectation in a try/catch/finally block is that before code runs finally it will either have run the try block until it returned or completed, or else it will have run the catch block. If a loop which is consuming an iterator exits without having consumed everything, the iterator will find out that the loop has exited, but will have no way of knowing why. Execution will jump to finally in manner which, had there been a catch, would not have fit the normal pattern for a try/catch/finally block.
May 14, 2014 at 12:51 AM
AdamSpeight2008 wrote:
It would also change meaning of existing C# code based on the existing iterator model.
Because uses a different implementation technique to Visual Basic.
Sure, the implementation would be different, but how would it be functionally different? As far as I can tell, other than the relaxing of these two limitations, the behavior of iterators between C# and VB.NET are identical. MSDN mentions no other differences.
May 14, 2014 at 1:05 AM
supercat wrote:
The normal expectation in a try/catch/finally block is that before code runs finally it will either have run the try block until it returned or completed, or else it will have run the catch block. If a loop which is consuming an iterator exits without having consumed everything, the iterator will find out that the loop has exited, but will have no way of knowing why. Execution will jump to finally in manner which, had there been a catch, would not have fit the normal pattern for a try/catch/finally block.
The same behavior is expected with try/finally blocks but nobody seems concerned that the appearance of execution jumps from the middle of the try block to the finally block when the IEnumerator<T> is disposed today. Nor is the expectation between try/catch/finally blocks any different between C# and VB.NET. Iterators, by design, don't behave like normal functions. Neither do async methods, and they don't have any imposed limitation on awaiting from a try block in either language.

As far as I can tell the limitations in iterators in C# are not due to a technical issue. At the time they were implemented they deferred those hard problems and had no reason to revisit them. The VB.NET team, since it implemented the iterator state machine at the same time that they implemented the async state machine, had the opportunity to revisit this limitation. If someone from either team wants to correct my interpretation that would be great.
Developer
May 14, 2014 at 1:22 AM
Implementing anonymous iterators in Roslyn would be quite easy. We didn't do it because it isn't in the language specification, and nobody championed adding the feature.

The reason C# doesn't allow you to yield return in a try block that has a catch clause is as follows.

You can think of an iterator method as a special kind of method that "calls" the body of the loop as a subroutine at the point of every "yield return". If we allow you to place a yield return in a try block that has a catch block, then in order to preserve that mental model of an iterator method, an exception thrown from the body of the loop should be capable of being caught inside the iterator method. But that is not how the iterator transformation actually behaves.

I don't think this is a very strong reason, but there it is.
May 14, 2014 at 1:46 AM
nmgafter wrote:
Implementing anonymous iterators in Roslyn would be quite easy. We didn't do it because it isn't in the language specification, and nobody championed adding the feature.
Even when I started this thread it wasn't that big of a deal, honestly. Just a question of feature parity between languages. However I do see anonymous iterators being a simple solution to the issue of validating arguments within an iterator at the time that the iterator "method" is called and not the first time the consumer attempts to enumerate the first value.
The reason C# doesn't allow you to yield return in a try block that has a catch clause is as follows.

You can think of an iterator method as a special kind of method that "calls" the body of the loop as a subroutine at the point of every "yield return". If we allow you to place a yield return in a try block that has a catch block, then in order to preserve that mental model of an iterator method, an exception thrown from the body of the loop should be capable of being caught inside the iterator method. But that is not how the iterator transformation actually behaves.

I don't think this is a very strong reason, but there it is.
That's interesting. I've never heard that mental model being described and it is definitely inverted from the way I perceive iterators. Even with that mental model it appears to break down with try/finally since you rely on idiomatic consumption of the iterator, without which that finally clause might never be executed. So while I can understand that the yield can be viewed as a call into the consuming loop the metaphor isn't exactly perfect. Of course, when I mention the ability to yield within a try block with a catch block I refer specifically to the ability to catch exceptions raised from within the iterator body itself, not any exceptions raised by the consumer of the enumerable.
May 14, 2014 at 2:00 PM
Edited May 14, 2014 at 2:03 PM
nmgafter wrote:
Implementing anonymous iterators in Roslyn would be quite easy. We didn't do it because it isn't in the language specification, and nobody championed adding the feature.

The reason C# doesn't allow you to yield return in a try block that has a catch clause is as follows.

You can think of an iterator method as a special kind of method that "calls" the body of the loop as a subroutine at the point of every "yield return". If we allow you to place a yield return in a try block that has a catch block, then in order to preserve that mental model of an iterator method, an exception thrown from the body of the loop should be capable of being caught inside the iterator method. But that is not how the iterator transformation actually behaves.

I don't think this is a very strong reason, but there it is.
Interesting. If it's easy hopefully you guys decide to do it at some point.

I agree with Halo_Four that this might be a good solution for eager evaluation of arguments:
public IEnumerable<String> Foo(String bar)
{
    if(bar == null) throw new ArgumentNullException();

    Func<String, IEnumerable<String>> impl = b =>
    {
        yield return b; 
    };
    return impl(bar);
}
That is certainly a lot nicer than writing two distinct methods.

Now if only var would implicitly infer the type of impl...
May 15, 2014 at 1:51 AM
Ya know, after going back and fixing some of my existing iterators so that their arguments are validated up front I am beginning to think that anonymous iterators would just feel like a dirty hack and that perhaps a contextual keyword with yield may be more appropriate. Some way to inform the compiler that anything before this point should be executed immediately and anything beyond is the iterator itself. I've not thought through what the syntax might look like, but something akin to the following:
public static IEnumerable<T> SkipEvery(this IEnumerable<T> source, int skip)
{
    Contract.Requires<ArgumentNullException>(source != null, "source");
    Contract.Requires<ArgumentOutOfRangeException>(skip > 0, "skip");
    yield continue;
    int index = 0;
    foreach (T value in source)
    {
        if (index == skip)
        {
            index = 0;
            continue;
        }
        yield return value;
        index++;
    }
}
I know that yield continue doesn't feel right but after scanning through the existing keywords nothing quite felt right and I didn't want to propose anything new. At least yield currently is pretty strict about what can follow so it wouldn't break any existing code.

Another thought would be to have the compiler look for code contract calls and to keep them in the iterator method rather than moving them into the state machine. That doesn't help me much as we have a set of argument validating classes that predate code contracts, but that at least would benefit others.
May 15, 2014 at 5:36 PM
When, and how many times would the code before yield continue execute if the iterator was enumerated twice?

I would suggest allowing an IEnumerable() {...} statement to appear before any other executable code [variable declarations could precede it only if they had no initializers]. That code would be run before the IEnumerable() implementation was constructed; when it finished, a snapshot would be taken of any variables declared before that block, and the snapshot would be used to set the initial values of those variables on any subsequent enumeration.

That would help make clear that in the case of something like int[] foo; IEnumerable() { foo = new int[5]; } there would be one array associated with the IEnumerable, and each enumerator would receive a reference to that same array (IMHO, if the code had been int[] foo = new int[5]; IEnumerable() { ... } it would be unclear whether the array belonged to the enumerable, or would be created separately for each enumerator).

Even if code were to redefine IEnumerable without redefining IEnumerable<T>, such that IEnumerable(); would be a valid statement, I don't think there's any way syntactically that the statement could have any meaning in the existing language no matter how IEnumerable was defined [obviously the iterator can only be valid if IEnumerable<T> has its expected meaning].
May 15, 2014 at 6:44 PM
supercat wrote:
When, and how many times would the code before yield continue execute if the iterator was enumerated twice?
What comes before that point would not be lifted into the generated iterator class. It would remain in the method body. That way it is only ever executed once when the method is called and it wouldn't matter how often the enumerable was subsequently enumerated.

Currently C# 2.0+ turns the following:
public IEnumerable<int> Blah(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    yield return 1;
    yield return 2;
}
Into the following:
public IEnumerable<int> Blah(string value)
{
    // oops, you meant to validate the 'value' argument?
    var instance = new compiler_generated_class(-2);
    instance.value = value;
    return instance;
}
If the consumer called Blah passing null for the argument there would be no exception until the consumer attempted to enumerate over it which could happen much later as a part of a composed query.

So, the goal with this proposal is to denote which code should remain in the original method:
public IEnumerable<int> Blah(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    yield continue; // or whatever syntax
    yield return 1;
    yield return 2;
}
Which would instead be turned into:
public IEnumerable<int> Blah(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }

    // yield continue (or whatever) was here

    var instance = new compiler_generated_class(-2);
    instance.value = value;
    return instance;
}
That way the call to Blah throws the exception immediately and not at some potentially deferred point in the future.

This would effectively replace the current pattern for handling argument validation which is as follows:
public IEnumerable<int> Blah(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    return BlahInternal(value);
}

private IEnumerable<int> BlahInternal(string value)
{
    yield return 1;
    yield return 2;
}
May 15, 2014 at 7:38 PM
Halo_Four wrote:
What comes before that point would not be lifted into the generated iterator class. It would remain in the method body. That way it is only ever executed once when the method is called and it wouldn't matter how often the enumerable was subsequently enumerated.
What should be the expected effect of iterating
IEnumerable<int> wow
{
  int[] arr = new int[0];
  arr[0] += 10;
  yield continue;
  yield return arr[0]++;
  yield return arr[0];
}
multiple times? Would you consider it obvious that the first time will yield the sequence {10,11}, the next time {11,12}, etc.? What if the syntax were:
IEnumerable<int> wow
{
  readonly int[] arr; // Must be written only in the immediate following block
  IEnumerable()
  {
    arr = new int[0];
    arr[0] += 10;
  }
  yield return arr[0]++;
  yield return arr[0];
}
or better yet
IEnumerable<int> wow
{
  IEnumerable()
  {
    export readonly int[] arr = new int[0];
    arr[0] += 10;
  }
  yield return arr[0]++;
  yield return arr[0];
}
Where export readonly was a new construct that would, in certain contexts, cause a read-only variable to be declared at the enclosing scope (or class scope in the case of certain class constructors or "partial class constructors").
May 23, 2014 at 7:00 PM
If anonymous iterators were available, I would be tempted to write a method like this:
static T Invoke<T>(Func<T> func)
{
    return func();
}
So that I could separate the eager and deferred blocks by doing this:
public IEnumerable<String> Foo(String bar)
{
    if (bar == null) throw new ArgumentNullException();
    return Invoke(() =>
    {
        yield return bar;
    });
}
I think that would also make the evaluation clear in supercat's example:
public IEnumerable<int> Foo()
{
    // This is a normal method, so it's executed once per call to Foo()
    var arr = new int[1];
    arr += 10;
    return Invoke(() =>
    {
        // This is a lambda, so it's capturing a single variable
        // no matter how many times GetEnumerator() is called
        yield return arr[0]++;
        yield return arr[0];
    });
}
And it would allow iterator blocks to be used in expressions that need enumerables instead of delegates, such as the from clause in a query expression:
var query =
    from x in list
    from y in Invoke(() =>
        {
            yield return x;
            if (x % 2 == 0)
            {
                yield return x / 2;
            }
        })
    select y;
But perhaps what I really want is syntax to create an IEnumerable directly without the overhead of creating a delegate just to invoke it. For example, consider an "iterator" keyword such that iterator <statement> means the same thing as Invoke(() => <statement>) with the above definition of "Invoke" but without the extra delegate allocation. That syntax could eliminate the need for anonymous iterator lambdas, since they could always be written as x => iterator { yield return x; }.