Null-propagating operator ?. in Expression Trees: spec v1

Topics: C# Language Design
Nov 1, 2014 at 8:45 PM
This Specification discusses the support of ?. in expression trees and should be read as an addition
to the ones that define the new operation:

Initial Spec: https://roslyn.codeplex.com/discussions/540883
Associativity: https://roslyn.codeplex.com/discussions/543895

Associativity and interactions with other language features (nullable types, extension methods, delegate invocation, ...) is identical for expression trees than when the operator is used in any other context.

Rationale

There has been no additions to compiler-generated expression trees since .Net 3.5, but so far there have been solid reasons for it:
  • In .Net 4.0 there are a lot of new Expression classes to represent statements, but the C# compiler is unable to translate lambda expressions
    with an statement body to the equivalent expression tree using this new classes. This can be justified because:
    • There's no important use case for this feature. If or Try/Catch can not be translated to SQL or other typical target languages.
    • Will be a complex addition to the C# compiler.
    • Will make writing LINQ providers even harder, diluting the line of which type of expressions should be supported and which not.
  • In C# 4, dynamic was added, but with no support to expression trees. Reasons:
    • There's no important feature that will enable. LINQ best feature is to be a strongly-typed SQL, being dynamic makes not that much sense.
    • Defining lambda type inference and dynamic together is a problem on it's own.
    • System.Linq.Expression will have to be deeply changed to support general dynamic expressions.
  • In C# 5 introduced async/await, but with no translation to expression trees:
    • There's no important feature that will enable, async/await is not supported in many target languages anyway.
    • yield return is also not supported.
But now in C# 6 the new ?. operator will be introduced. Even considering the associativity complications is still a relatively simple operator in comparison, and only makes local changes in the expression.

Also ?. will be taught together with conditional operator ? : and coalesce operator ??, both supported in expression trees. Will be surprising if not supported.

Nullable Navigation Properties

SQL is the typical target language of LINQ, and Navigation properties are one of the features that makes LINQ queries simpler than their SQL translations.

Navigation properties are usually translated by LINQ provides as LEFT OUTER JOIN because reducing the number of results just by accessing a property is counter-intuitive, but
the side effect is that, in LINQ queries, Null-propagation happens implicitly. Writing a LINQ provider that throws NullReferenceException is just too hard in general, and will lead to inefficient queries.

Knowing this behaviour, the situation is not that bad, as long as a chain of (optional) Navigation properties does not end in a ValueType.

Writing a query like this, using EntityFramwork against AdventureWorks database:
using (AdventureWorks2012Entities ctx = new AdventureWorks2012Entities())
{
    var list = ctx.Customers.Select(a => a.Person.BusinessEntityID).ToList();
}
Of even using anonymous types:
using (AdventureWorks2012Entities ctx = new AdventureWorks2012Entities())
{
    var list = ctx.Customers.Select(a => new 
    { 
        a.CustomerID, 
        a.Person.BusinessEntityID 
    }).ToList();
}
You get the exception message:
The cast to value type 'Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type.
The problem is that what for C# is a non-nullable int, for SQL is a non-nullable int column that happens to be nullable because of the LEFT OUTER JOIN.

The end result is that the user of the LINQ provider gets an exception that is hard to understand, because the type-mismatch breaks the LINQ illusion.

Here are some people having this particular problem 1, [2] (http://www.c-sharpcorner.com/forums/thread/235070/the-cast-to-value-type-int32-failed-because-the-materializ.aspx), but in Entity Framework this error is more often due to Sum translation.

This page of documentation tries to help the developer that finds an equivalent exception in our LINQ provider.

In the absense of non-nullable reference types, this problem is always going to be there, but currently the solution is also counter-intuitive:
using (AdventureWorks2012Entities ctx = new AdventureWorks2012Entities())
{
    var list = ctx.Customers.Select(a => (int?)a.Person.BusinessEntityID).ToList();
}
In the eyes of a user, how will a null value get out from an int property (BusinessEntityID) and then casted back to int?. Of course the answer is that all this doesn't matter in SQL, the only important thing is that the DataReaders returns something that fits in the target C# container, but again, this breaks the LINQ illusion.

It gets worst using anonymous types:
using (AdventureWorks2012Entities ctx = new AdventureWorks2012Entities())
{
    var list = ctx.Customers.Select(a => new 
    { 
        a.CustomerID, 
        BusinessEntityID = (int?)a.Person.BusinessEntityID 
    }).ToList();
}
Here not only the solution is counter-intuitive, you also have to repeat the property name.
Nov 1, 2014 at 8:46 PM
Edited Nov 2, 2014 at 10:25 AM

Solution 1: NullPropagationExpression

The firs possible solution is to create a new NullPropagationExpression in System.Linq.Expressions like this:
public  class NullPropagationExpression : Expression
{
    public Expression Receiver { get; private set; }
    public ParameterExpression AccessParameter { get; private set; }
    public Expression AccessExpression { get; private set; }
    public Type type;

    public NullPropagationExpression(Expression receiver, ParameterExpression accessParameter, Expression accessExpression)
    {
        this.Receiver = receiver;
        this.AccessParameter = accessParameter;
        this.AccessExpression = accessExpression;
        this.type = AccessExpression.Type.Nullify();
    }
    
    public override Type Type
    {
        get { return type; }
    }
}
This will also include adding a new NullPropagation static method in Expression class and add NullPropagation in ExpressionType enum.

Also, the C# compiler will take advantage of this new class when generating expressions.

So if currently this:
Expression<Func<Person, string>> func = p => p.ToString();
Is translated to this:
ParameterExpression p;
Expression<Func<Person, string>> func = Expression.Lambda<Func<Person, string>>(
    Expression.Call(p = Expression.Parameter(typeof(Person), "p"), (MethodInfo)methodof(object.ToString)),
    p);
Then using null-propagation operator:
Expression<Func<Person, string>> func2 = p => p?.ToString();
Will be translated to:
ParameterExpression p;
ParameterExpression p2;
Expression<Func<Person, string>> func = Expression.Lambda<Func<Person, string>>(
    Expression.NullPropagation(
        receiver: p = Expression.Parameter(typeof(Person), "p"),
        accessParameter: p2 = Expression.Parameter(typeof(Person)),
        accessExpression: Expression.Call(p2, (MethodInfo)methodof(object.ToString))),
    p);
Pros:
  • This approach accurately reflects the intent of the developer in all cases.
Cons:
  • This operator could not be used in expression in .Net 3.5/4.0/4.5/4.5.1/4.5.1 because the class will not exist.
  • LINQ providers will need to be updated to take advantage of this operator.
The first problem is a specially hard one, since -as far as I know- it will not be possible for two libraries, like EntityFramework and ASP.Net MVC, to define a temporal System.Linq.Expression.NullPropagationExpression class without conflicting with each other.
Nov 1, 2014 at 8:46 PM
Edited Nov 2, 2014 at 11:00 AM

Solution 2: Convert to ConditionalExpression

This solution is less accurate, translating the ?. to the equivalent ConditonalExpression. So:
Expression<Func<Person, string>> func2 = p => p?.ToString();
Will be translated to:
ParameterExpression p;
Expression<Func<Person, string>> func = Expression.Lambda<Func<Person, string>>(
    Expression.Condition(
        Expression.Equals(p = Expression.Parameter(typeof(string), "p"), Expression.Constant(null, typeof(string))),
        Expression.Constant(null, typeof(Person)),
        Expression.Call(p, (MethodInfo)methodof(object.ToString))),
    p);
Pros:
  • It will work with any version of the .Net framework, just as it does outside of expression trees. No need to create a temporal NullPropagationExpression that could cause conflicts.
  • Any LINQ provider that already supports ConditionalExpression will work from day 1 with the new operator.
Cons:
  • Sub-optimal translations could appear if the pattern is not recognized by the LINQ provider, evaluating properties twice or generating superfluous CASE statements in SQL queries. I've created a Git repository with the code required by LINQ providers to recognize the Conditional pattern and create internal NullPropagationExpression.
  • 100% accurate translation will never be possible. For example:
customer.Where(a=>c.Increment()?.Name) //Written by the user 
customer.Where(a=>c.Increment() == null ? null : c.Increment().Name) //Incorrectly interpreted by an old LINQ provider
customer.Where(a=>c.Increment()?.Name) //Written by the user 
customer.Where(a=>c.Increment() == null ? null : c.Increment().Name) //Generated by the compiler
customer.Where(a=>c.Increment()?.Name) //Correctly interpreted by a new LINQ provider
customer.Where(a=>c.Increment() == null ? null : c.Increment().Name) //Written by the user
customer.Where(a=>c.Increment()?.Name) //Incorrectly interpreted by a new LINQ provider
Anyway, LINQ providers usually make strong transformations to the code and side effects are rare and discouraged (increment/decrement/assign expressions are not generated by the compiler), I can not see any real problem that this could cause.

I've prepared a pull request with minimal changes from cloneandgo/releases/Dev14CTP4 that implements this solution in Roslyn compiler https://roslyn.codeplex.com/SourceControl/network/forks/Olmo/NullPropagationExpresions/contribution/7642
Nov 2, 2014 at 3:06 AM
Edited Nov 2, 2014 at 3:09 AM
We should remember, that Expressions are also used for dynamic method generation (using Compile). Solution 2 will access value twice inside method, prepared by Compile. So, may be we should discuss 3rd solution - declare temporary variable in expression and than check it. Something like:
        Expression<Func<Person, string>> func = Expression.Lambda<Func<Person, string>>(
            Expression.Block(
                new[] {temp = Expression.Variable(typeof(Person))},
                Expression.Assign(
                    temp,
                    p = Expression.Parameter(typeof(Person), "p")),
                Expression.Condition(
                    Expression.Equal(temp, Expression.Constant(null, typeof(Person))),
                    Expression.Constant(null, typeof(string)),
                    Expression.Call(temp, typeof(object).GetMethod("ToString")))),
            p);
It is not necessary, when we work with simple function parameter - but necessary when we check complex expression.
Cons:
Most current LINQ providers will fail on parsing it.
Nov 2, 2014 at 10:20 AM
darkman666 wrote:
We should remember, that Expressions are also used for dynamic method generation (using Compile). Solution 2 will access value twice inside method, prepared by Compile. So, may be we should discuss 3rd solution - declare temporary variable in expression and than check it.
Its true, but will be an issue only when this expressions are compiler-generated. Something like creating part of the expression using EXpression<T> and then modifying it and compiling it. Still is an issue so here is my third solution:
Nov 2, 2014 at 10:59 AM
Edited Nov 2, 2014 at 11:00 AM

Solution 3: Hybrid approach

This solution tries to be a middle ground between Solution 1 (accurate translation) and Solution 2 (better backwards compatibility).

The idea is that the expression:
Expression<Func<Person, string>> func2 = p => p?.ToString();
Is translated to
ParameterExpression p;
ParameterExpression p2;
Expression<Func<Person, string>> func = Expression.Lambda<Func<Person, string>>(
    ExpressionCSharp60.NullPropagation(
        receiver: p = Expression.Parameter(typeof(Person), "p"),
        accessParameter: p2 = Expression.Parameter(typeof(Person)),
        accessExpression: Expression.Call(p2, (MethodInfo)methodof(object.ToString))),
    p);
Note that the only difference with Solution 1 is that NullPropagation method is now defined in ExpressionCSharp60 class.

In .Net 4.5.3 NullPropagationExpression will be defined as just another class in System.Linq.Expression and will be included in ExpressionType and ExpressionVisitor.

Also ExpressionCSharp60 will be defined like:
public class ExpressionCSharp60
{
    public static NullPropagationExpression NullPropagation(Expression receiver, ParameterExpression accessParameter, Expression accessExpression)
    {
        return new NullPropagationExpression(receiver, accessParameter, accessExpression);
    }
}
For any previous version, the developer will get a compile-time error:
error CSXXXX: Predefined type 'System.Linq.Expressions.ExpressionCSharp60' is not defined or imported
error CSXXXX: Cannot find all types required to translate '?.' to expression trees. Are you targeting the wrong framework version, or missing a reference to an assembly?
This is very similar to what a developer gets in .Net 4.0 if he tries to use async/await.
error CS0518: Predefined type 'System.Runtime.CompilerServices.IAsyncStateMachine' is not defined or imported
error CS1993: Cannot find all types required by the 'async' modifier. Are you targeting the wrong framework version, or missing a reference to an assembly?
Then, the developer can implement this simple class (or download a Nuget) like this:
public class ExpressionCSharp60
{
    public static ConditionalExpression NullPropagation(Expression receiver, ParameterExpression accessParameter, Expression accessExpression)
    {
        var fullAccessExpression = ExpressionReplacer.Replace(accessExpression, accessParameter,
                Nullable.GetUnderlyingType(receiver.Type) == null ? receiver : Expression.Property(receiver, "Value"));

        var type = accessExpression.Type.Nullify();

        if (fullAccessExpression.Type != type)
            fullAccessExpression = Expression.Convert(fullAccessExpression, type);

        return Expression.Condition(Expression.Equal(receiver, Expression.Constant(null, receiver.Type)),
            Expression.Constant(null, type), fullAccessExpression);
    }
}
What this code does is to translate the ?. pattern to ConditionalExpression.

Pros:
  • This approach accurately reflects the intent of the developer for .Net 4.5.3 or greater.
  • Developers in any older version of the framework will be able to take advantage of the operator. But they are conscious that they are in an strange situation, just like with any other framework mocking (linqbridge, Microsoft.Bcl.Async, ...).
  • Older libraries (i.e.: EntityFramework 4.0) will not need to make any changes to support the new operator if the mock is in place. They can however still recognize the pattern using code like the one in my example Git repository to make better translations.
Cons:
  • New libraries will need to make code changes to accept NullPropagationExpression, maybe just calling Reduce.
  • There's a different behaviour depending the version of the framework. A developer could use ?. intensively in .Net 4.0 (being used as ConditionalExpresion) and then update to .Net 4.5.3 and realize that is not supported by the library (now is translated to NullReferenceException).
Still, I think this is probably the best solution, because it offers a path for versioning expression trees as C# evolves.

Conclusion

I hope I've been able to express why I think this change is useful and easy to implement.

Whatever the solution chosen, I hope that this helps to male Null-propagating operator feature complete.
Nov 2, 2014 at 12:39 PM
Nov 2, 2014 at 8:14 PM
Third solution looks better, but we still should discuss, if NullPropagationExpresions should implement Reduce or not. If it will implement it, I suppose it should be expanded to ConditionalExpression with temporary variable.

If this expression will not implement Reduce, some code generators (build-in Compile, LightCompile from DLR) should also be updated - and as I know, we have a lot of different Compile implementation - classic .Net, .Net Native, Xamarin for iPhone.
Nov 2, 2014 at 9:16 PM
darkman666 wrote:
Third solution looks better, but we still should discuss, if NullPropagationExpresions should implement Reduce or not. If it will implement it, I suppose it should be expanded to ConditionalExpression with temporary variable.

If this expression will not implement Reduce, some code generators (build-in Compile, LightCompile from DLR) should also be updated - and as I know, we have a lot of different Compile implementation - classic .Net, .Net Native, Xamarin for iPhone.
You right. I took the one from the previous solution.

Now is updated: https://github.com/olmobrutall/NullPropagation/blob/master/Class1.cs

I've managed to save the ExpressionReplacer call, but I've to declare two variables for nullable types. Not a big deal I suppose.
Nov 9, 2014 at 8:41 PM
Nov 16, 2014 at 9:12 PM
Again, a pull request that now works on top of releases/Dev14Preview (now merged into Roslyn) so is easier to see the few changed files.

https://roslyn.codeplex.com/SourceControl/network/forks/Olmo/NullPropagationInExpression/contribution/7700#!/tab/comments