Declaration Expression Definite Assignment Error in LINQ

Topics: C# Language Design
May 8, 2014 at 8:42 PM
Hi,

I'm getting an unexpected error in the following code:
var q = from i in new int[100]
        where int.TryParse(i.ToString(), out var value)
        select value;  // Error: Use of unassigned local variable 'value'
Is this supposed to be a supported scenario? It's not exactly what I need, but it's the simplest repro. I tried a few other variations, including using let, but it's always the same error.

Thanks,
- Dave
May 9, 2014 at 6:01 PM
Edited May 9, 2014 at 6:03 PM
According to C# spec and how LINQ query expressions are transformed into methods calls, your query is equivalent to:
var q = (new int[100]).Where(i -> int.TryParse(i.ToString(), out var value))
                      .Select(i -> value);
That's why you're getting an error. But, error message is really strange. It should say that value does not exist at all. Either something changed in the way LINQ queries are transformed into methods calls (and because there is no new version of spec yet we don't know how it works now) or error message is incorrect.
May 9, 2014 at 6:56 PM
True, but then that would be translated further into the following:
int value;
var q = (new int[100]).Where(i => int.TryParse(i.ToString(), out value))
                      .Select(i => value);
This produces the error about value being uninitialized today in C# 5.0 because the compiler cannot guarantee that any of those lambdas are called at all let alone in any particular order. Assigning value prior to the query does remove the error and it would execute as expected, sorta.

The problem with doing something like this is that the semantics are likely not what the programmer expected. The expression requires that the lambdas rely on side effects to the enclosed variables from other lambdas, despite the query syntax making it look like it's entirely inline. Imagine if this were a parallel query, the value of value would be altered by every concurrent call to the predicate lambda leaving it in an indeterminate state by the time the projection lambda was called.

What is confusing is that the query isn't being translated into the following:
var q = (new int[100]).Where(i =>
                      {
                          int value;
                          return int.TryParse(i.ToString(), out value);
                      })
                      .Select(i => value);
That would result in the error message that would be more expected, that value is not defined. So it seems that somebody thought of a scenario in which the inline declared variable would be defined within a scalar lambda and lifted that variable to the calling method. I'm curious as to the thought process and intention there as given this situation it doesn't seem fully fleshed out.

In my opinion, if inline variable declaration is to be permitted within a query like that I think it should be only permitted within the context of a let clause like as follows:
var q = from i in new int[100]
        let parsed = int.TryParse(i.ToString(), out var value)
        where parsed == true
        select value;
Which could be translated as follows:
var q = (new int[100])
                      .Select(i =>
                      {
                          int value;
                          bool parsed = int.TryParse(i.ToString(), out value);
                          return new { i, value, parsed };
                      })
                      .Where(i => i.parsed == true)
                      .Select(i => i.value);
May 10, 2014 at 1:10 AM
Edited May 10, 2014 at 1:10 AM
Halo_Four wrote:
int value = 0;
var q = (new int[100]).Where(i => int.TryParse(i.ToString(), out value))
                      .Select(i => value);
[…]

Imagine if this were a parallel query, the value of value would be altered by every concurrent call to the predicate lambda leaving it in an indeterminate state by the time the projection lambda was called.
You don't even need parallelism, adding .ToList() between the Where and the Select will also break this code, because it (deterministically) changes the order of evaluation.
May 10, 2014 at 6:16 AM
Edited May 10, 2014 at 6:18 AM
[Edit: Small spelling error]

At first, I agreed with Halo_Four's conclusion. It seemed correct to me that the compiler should probably disallow declaration expressions in query comprehension syntax when applied to where statements, yet allow it in let statements, which are really just select statements projecting into an anonymous type.

However, upon thinking about it further, it seems to me that Halo_Four's proposal assumes that declaration expressions can't be safely allowed in where statements simply because the compiler uses duck-typing to require a specific signature, yet it does not require specific behavior. Implementers aren't required to guarantee that the lambda is invoked once and only once for every element, synchronously and sequentially; however, hypothetically, if the compiler required all implementations to use the same behavior of where, then it would be a safe target for declaration expressions since it would be a safe assumption that the intention of a declaration expression within its lambda was to behave similarly to the proposed use of let, given that the lambda is guaranteed to execute once and only once for every element, sequentially and synchronously. (Note that this is certainly true in Rx's implementation of where as well). In that case, the compiler could apply a similar solution to the proposed let solution whenever it encounters a declaration expression in where. But as it stands currently, perhaps this isn't an assumption the compiler team is willing to make since operator implementations aren't guaranteed to behave "correctly".

I'm led to disagree with Halo_Four's conclusion based on the idea that if we apply the same reasoning to let, then it seems to have the same flaws as where. Use of let compiles to select, to which the compiler happily applies duck-typing for its signature but not its behavior, just like where. So implementers of the select operator could easily break its "contract", just like the problem with where, and the compiler would still perform the proposed rewrite!

So how is the proposed solution any different from applying the same "fix" for the where lambda, without requiring the user to explicitly use let?

For example,
var q = from i in new int[100]
        where int.TryParse(i.ToString(), out var value)
        select value;
could compile to the same output as the proposed let solution:
var q = (new int[100])
        .Select(i =>
        {
            int value;
            bool where_result = int.TryParse(i.ToString(), out value);
            return new { i, value, where_result };
        })
        .Where(i => i.where_result == true)
        .Select(i => i.value);
They are the same, semantically and behaviorally, as long as both where and select are implemented "correctly".

I hope that the compiler team strongly considers implementing this feature (at least Halo_Four's proposal to restrict use to let, if mine is rejected) as it would really help to improve LINQ queries in general. I know that there have been many times that I could have used out parameters in my queries, which this proposal would have satisfied.

I also think it's important to consider that, as it stands currently, the bad example that was given of value being lifted out of the comprehension and into a closure is actually how I think most people manually solve this problem today. As shown, it's brittle because it's easily broken via parallelism or even a small semantic change to the query, as svick has shown with ToList. One problem is that the closure relies on specific side effects of query execution, though I think I've shown that Halo_Four's proposal to use let doesn't avoid those assumptions. However, another problem with the closure is its stateful scope, which let does avoid. The closure's scope can be a problem due to the fact that it's shared by cold sequences, though perhaps in the enumerable world it's not a problem yet in the observable world it can be. So even if the compiler doesn't require the correct behavior of the select operator, compiling to let instead of using a closure (or deferring to the user to generally make the wrong decision and use a closure) is going to improve these kinds of queries in general.

Thanks for the feedback. Please let me know if you find any flaws in my conclusion :)
May 10, 2014 at 3:38 PM
I think out and ref parameters are in danger of extinction the more functional C# becomes.

Allowing general variable declarations in any query syntax operator will make the understanding of the generated method syntax complicated.

Here only where has been considered but what about join, groupby, orderby...

If such feature is introduced I will restrict it to let operators, and disallow it for Expression<T>, since will be hard to implement on LINQ to SQL providers
May 10, 2014 at 4:26 PM
Sure, lifting the predicate into a projection prior to the call to the Where extension method would at least be significantly more correct than what it does now. The reason I proposed limiting to the let clause was that while most of the other query keywords have a 1:1 relationship with their extension method brethren that let exists specifically for projection with a certain amount of syntax candy voodoo so, to me, it feels natural. Olmo is correct that while it isn't entirely out of place in a where expression or a select expression that it makes less and less sense with most of the other query operators.

The other part of this conversation is how these inline declarations play out in lambdas in general. With statement lambdas the variable scope remains confined to the scope of the lambda method body, which seems appropriate. Expression lambdas work exactly as they do with the query expressions, which does make sense currently as the query expressions are converted truthfully to their extension method counterparts. But of course that leads to the same oddness and brittleness that we see with query expressions:
Func<string, bool> f = s => int.TryParse(s, out var value);
value = 0; // required in order to avoid compiler error
if (f("foo"))
{
    Console.WriteLine(value);
}
I also tend to agree that maybe ref and out parameters are just a bad fit for these kinds of scenarios and that inline declaration is only rounding the corners of the square peg. Seems that there should be an overload to int.TryParse that instead of accepting an out int returns an int? which would integrate better with a query expression.