This project is read-only.

C# Language Design Notes for Feb 3, 2014

Topics: C# Language Design
Mar 27, 2014 at 12:27 AM

C# Language Design Notes for Feb 3, 2014

Notes are archived here.

Agenda

We iterated on some of the features currently under implementation
  1. Capture of primary constructor parameters <only when explicitly asked for with new syntax>
  2. Grammar around indexed names <details settled>
  3. Null-propagating operator details <allow indexing, bail with unconstrained generics>

Capture of primary constructor parameters

Primary constructors as currently designed and implemented lead to automatic capture of parameters into private, compiler-generated fields of the object whenever those parameters are used after initialization time.

It is becoming increasingly clear that this is quite a dangerous design. To illustrate, what’s wrong with this code?
public class Point(int x, int y)
{
    public int X { get; set; } = x;
    public int Y { get; set; } = y;
    public double Dist => Math.Sqrt(x * x + y * y);
    public void Move(int dx, int dy)
    {
        x += dx; y += dy;
    }
}
This appears quite benign, but is in fact catastrophically wrong. The use of x and y in Dist and Move causes these values to be captured as private fields. The auto-properties X and Y each cause their own backing fields to be generated, initialized with the x and y values passed in to the primary constructors. But from then on, X and Y lead completely distinct lives from x and y. Assignments to the X and Y properties will cause them to be observably updated, but the value of Dist remains unchanged. Conversely, changes through the Move method will reflect in the value of Dist, but not affect the value of the properties.

The way for the developer to avoid this is to be extremely disciplined about not referencing x and y except in initialization code. But that is like giving them a gun already pointing at their foot: sooner or later it will go subtly wrong, and they will have hard to find bugs.

There are other incarnations of this problem, e.g. where the parameter is passed to the base class and captured multiple times.

There are also other problems with implicit capture: we find, especially from MVP feedback, that people quickly want to specify certain things about the generated fields, such as readonly-ness, attributes, etc. We could allow those on the parameters, but they quickly don’t look like parameters anymore.

The best way for us to deal with this is to simply disallow automatic capture. The above code would be disallowed, and given the same declarations of x, y, X and Y, Dist and Move would have to written in terms of the properties:
    public double Dist => Math.Sqrt(X * X + Y * Y);
    public void Move(int dx, int dy)
    {
        X += dx; Y += dy;
    }
Now this raises a new problem. What if you want to capture a constructor parameter in a private field and have no intention of exposing it publically. You can do that explicitly:
public class Person(string first, string last)
{
    private string _first = first;
    private string _last = last;
    public string Name => _first + " " + _last;
}
The problem is that the “good” lower case names in the class-level declaration space are already taken by the parameters, and the privates are left with (what many would consider) less attractive naming options.

We could address this in two ways (that we can think of) in the primary constructor feature:
  1. Allow primary constructor parameters and class members to have the same names, with the excuse that their lifetimes are distinct: the former are only around during initialization, where access to the latter through this is not yet allowed.
  2. Introduce a syntax for explicitly capturing a parameter. If you ask for it, presumably you thought through the consequences.
The former option seems mysterious: two potentially quite different entities get to timeshare on the same name? And then you’d get confusing initialization code like this:
    private string first = first; // WHAT???
    private string last = last;
It seems that the latter option is the better one. We would allow field-like syntax to occur in a parameter list, which is a little odd, but kind of says what it means. Specifically specifying an accessibility on a parameter (typically private) would be what triggers capture as a field:
public class Person(private string first, private string last)
{
    public string Name => _first + " " + _last;
}
Once there’s an accessibility specified, we would also allow other field modifiers on the parameter; readonly probably being the most common. Attributes could be applied to the field in the same manner as with auto-properties: through a field target.

Conclusion
We like option two. Let’s add syntax for capture and not do it implicitly.

Grammar for indexed names
For the lightweight dynamic features, we’ve been working with a concept of “pseudo-member” or indexed name for the $identifier notation.

We will introduce this as a non-terminal in the grammar, so that the concept is reified. However, for the constructs that use it (as well as ordinary identifiers) we will create separate productions, rather than unify indexed names and identifiers under a common grammatical category.

For the stand-alone dictionary initializer notation of [expression] we will not introduce a non-terminal.

Null-propagating operator details

Nailing down the design of the null-propagating operator we need to decide a few things:

Which operators does it combine with?

The main usage of course is with dot, as in x?.y and x?.m(…). It also potentially makes sense for element access x?[…] and invocation x?(…). And we also have to consider interaction with indexed names, as in x?.$y.

We’ll do element access and indexed member access, but not invocation. The former two make sense in the context that lightweight dynamic is addressing. Invocation seems borderline ambiguous from a syntactic standpoint, and for delegates you can always get to it by explicitly calling Invoke, as in d?.Invoke(…).

Semantics

The semantics are like applying the ternary operator to a null equality check, a null literal and a non-question-marked application of the operator, except that the expression is evaluated only once:
e?.m(…)   =>   ((e == null) ? null : e0.m(…))
e?.x      =>   ((e == null) ? null : e0.x)
e?.$x     =>   ((e == null) ? null : e0.$x)
e?[…]     =>   ((e == null) ? null : e0[…])
Where e0 is the same as e, except if e is of a nullable value type, in which case e0 is e.Value.

Type

The type of the result depends on the type T of the right hand side of the underlying operator:
  • If T is (known to be) a reference type, the type of the expression is T
  • If T is (known to be) a non-nullable value type, the type of the expression is T?
  • If T is (known to be) a nullable value type, the type of the expression is T
  • Otherwise (i.e. if it is not known whether T is a reference or value type) the expression is a compile time error.
Apr 6, 2014 at 1:57 PM
Edited Apr 8, 2014 at 3:15 PM
I really like primary constructors. I agree that implicit capturing is not desired and the proposed syntax seems fine to me. However, we really need a way to define a constructor body for the primary constructor for argument validation and maybe some other initialization steps. For instance, consider the following Texture2D class that I have in my game code:
public class Texture2D
{
   private readonly GraphicsDevice _device;
   private readonly Size _size;

   public Texture2D(GraphicsDevice device, Size size)
   {
       _device = device;
       _size = size;

       Debug.Assert(_device != null);
       Debug.Assert(_size < GraphicsDevice.MaxTextureSize);

       // Create texture using Direct3D or OpenGL
   }

   // Other members
}
I'd love to be able to rewrite that class using a primary constructor - however, I currently have no way to validate the arguments and to call the Direct3D or OpenGL initialization function. I either could not use primary constructors at all, or I'd have to provide an Initialize function that would have to be called explicitly, either by the user of the code or by some framework class. That significantly reduces the usefulness of primary constructors. In my opinion, without a way to define a body, primary constructors should not be added to C#.

As for the syntax for primary constructor bodies, I'm not really sure what would be best. I suppose that's also why you haven't come up with a design as of yet. F# allows the declaration of code in the body of the class, maybe something similar would be possible in C#? Or maybe an initialize keyword could be introduced?
public class Texture2D(private GraphicsDevice device, private Size size)
{
    initialize {
       Debug.Assert(device != null);
       Debug.Assert(size < GraphicsDevice.MaxTextureSize);

       // Create texture using Direct3D or OpenGL
    }

   // Other members
}
Apr 6, 2014 at 2:55 PM
@Expandable: I think the proper way to address the condition checking should be different. Your proposal doesn't play nicely with constructor chaining: if another constructor is going to change something, the checks would need to be redone.

I see two ways of making it better:
  1. Allow explicitly specifying constraints for backing field value on properties. Note that this must be valid only for the properties where the backing field exists and is implicit, since there is no other connection between a private field and a property.
  2. Create an explicit post-construction callback, like Delphi's TObject::AfterConstruction. It must run automatically after all the constructors of the most derived object have finished, and object initializers has run. (This can have numerous other uses as well.)
Apr 6, 2014 at 3:40 PM
Edited Apr 6, 2014 at 4:44 PM
@VladD: I was under the impression that the primary constructor is always the last constructor to run (Update: this is indeed not correct, see my next post). See also section 2.3 in the "Upcoming Features in CSharp.docx" document. In that case, the problem you mentioned wouldn't occur. Furthermore, this problem isn't specific to primary constructors at all. You would have to consider the same issues with multiple explicit constructors.

If there are multiple constructors that all need to validate the same set of constructor arguments, I would factor out that code into a private Validate(...) method and call that from all constructors. Again, this is unrelated to the primary/explicit constructor issue.
Apr 6, 2014 at 4:22 PM
Edited Apr 6, 2014 at 4:46 PM
@Expandable: I couldn't find the document you mentioned. However, my impression was that the primary constructor has to run first, since all other constructors must (directly or indirectly) invoke the primary one. This means that every constructor that has a chance of running last must invoke Validate, which would be not DRY enough.

Of course, as you correctly mentioned, the problem is the same for multiple constructors, and (to some extent) for constructors in derived classes, so I was looking for a common improvement for both old and new semantics.
Apr 6, 2014 at 4:43 PM
Edited Apr 6, 2014 at 4:46 PM
@VladD: You can find the document in the download section of the Roslyn SDK (https://connect.microsoft.com/VisualStudio/Downloads/DownloadDetails.aspx?DownloadID=52793). It's somewhat hidden and should be more visible on the front page of the Roslyn CodePlex site.

As you say, all explicit constructors must invoke the primary one - and of course you're right, the primary one therefore runs first, not last, as I originally thought for some reason. Sorry about that mix-up.

I'm not really sure if your suggested improvements are really needed from a philosophical point of view: I tend to see the validation problem in my example above as classical pre/post condition checks, and that's every method's own responsibility. Sure, there are class invariants that you might want to validate, but I'm fine with calling a validation method in the appropriate places. That's after all an implementation detail.

If you think it throught, your second solution suggested above should likely be extended to all method, property, indexer calls, etc., as you might want to check your class invariants after any of those have executed. So that would suggest adding a ValidateClassInvariants magic method that validates class invariants after each operation that possibly changes the internal state of an object. I'm not sure I'd really want that.

So in conclusion, I think that primary constructor bodies are a must-have feature in order to allow defensive programming, as is the default throughout all the classes of the .NET framework.

Edit: By the way, in some cases you could move internal state validation to the setters of your properties. Primary constructors and auto-initialized properties, however, do not invoke property setters, so that solution doesn't help in this case.
Apr 7, 2014 at 2:21 PM
Edited Apr 8, 2014 at 3:16 PM
Another interesting idea has come up regarding primary constructor syntax and bodies, see this discussion: https://roslyn.codeplex.com/discussions/541421

Update: And this one: https://roslyn.codeplex.com/discussions/541575
Apr 9, 2014 at 5:30 AM
Scala did this long ago and seems to have gotten it right.
Apr 9, 2014 at 10:01 AM
public class Person(private string first, private string last)
{
    public string Name => _first + " " + _last;
}
I'm assuming this was a typo and the correct code should have been
public class Person(private string first, private string last)
{
    public string Name => first + " " + last;
}
Apr 9, 2014 at 3:07 PM
So, with explicitly captured parameters, how is one supposed to xml document them?
Apr 9, 2014 at 3:44 PM
Regarding Capture of primary constructor parameters and allowing for a constructor body, why not allow for a default constructor that is executed after any initialization of constructor parameters?
public class Texture2D(private GraphicsDevice device, private Size size)
{
    Texture2D()
    {
       Debug.Assert(device != null);
       Debug.Assert(size < GraphicsDevice.MaxTextureSize);

       // Create texture using Direct3D or OpenGL
    }

   // Other members
}
Apr 20, 2014 at 6:49 PM
Edited Apr 22, 2014 at 2:41 PM
Couldn't the Indexed member access syntax be simplified by removing the dot?
// Now
obj.$name

// My suggestion
obj$name
In VBA, for instance, you can simply write obj!name. C# shouldn't be more complicated than VB!
Apr 20, 2014 at 10:38 PM
OlivierJ wrote:
Couldn't the Indexed member access syntax be simplified by removing the dot?
// Now
obj.$name

// My suggestion
obj$name
In VBA, for instance, you can simply write obj!name. C# shouldn't be more complicated than VB!
I like this idea. It makes the '$' character a 'index of literal' operator vs the '.$' is member of/index of operator.
Apr 22, 2014 at 2:26 PM
OlivierJ wrote:
Couldn't the Indexed member access syntax be simplified by removing the dot?
// Now
obj.$name

// My suggestion
obj$name
In VBA, for instance, you can simply write obj!name. C# shouldn't be more complicated than VB!
I think this might be a better syntax too. The .$ syntax is really about saving 2 characters foo.$dog vs foo["dog"] and enabling better tooling, right? So why not save 3 characters? Unlike the new ?. operator you're not actually accessing a member of the type, so at worst the dot is misleading, and at best it's not adding any value. I think the risk of confusing novice programmers by the operator including a dot is a real one.
Apr 22, 2014 at 2:41 PM
Edited Apr 22, 2014 at 10:28 PM
Null-propagating operator. Could the proposed syntax be simplified from e?.x to e?x and from e?.$x to e?$x? Instead of supplementing the dot operator the question mark would replace the dot operator.

I see that there might be a conflict with the ternary operator, but maybe someone sees a possibility how to resolve it. For instance another symbol could be used like e@x.
Apr 22, 2014 at 10:08 PM
OlivierJ wrote:
Null-propagating operator. Could the proposed syntax be simplified from e?.x to e?x and from e?.$x to e?$x? Instead of supplementing the dot operator the question mark would replace the dot operator.

I see that there might be a conflict with the ternary operator, but maybe someone sees a possibility how to resolve it.
As you said, that will conflict with the ternary operator.

What does a?b?c:null mean? a?.b ? c : null or a ? b?.c : null?
Apr 22, 2014 at 10:11 PM
MgSam wrote:
OlivierJ wrote:
Couldn't the Indexed member access syntax be simplified by removing the dot?
// Now
obj.$name

// My suggestion
obj$name
In VBA, for instance, you can simply write obj!name. C# shouldn't be more complicated than VB!
I think this might be a better syntax too. The .**MgSam wrote:**
syntax is really about saving 2 characters
foo.$dogvsfoo["dog"]and enabling better tooling, right? So why not save 3 characters? Unlike the new?.` operator you're not actually accessing a member of the type, so at worst the dot is misleading, and at best it's not adding any value. I think the risk of confusing novice programmers by the operator including a dot is a real one.
For me, a.$b tells me that I'm not dotting into $b which means I'm indexing a with "b". I'm not dollaring into b. :)
Apr 22, 2014 at 10:42 PM
PauloMorgado wrote:
OlivierJ wrote:
Null-propagating operator. Could the proposed syntax be simplified from e?.x to e?x and from e?.$x to e?$x? Instead of supplementing the dot operator the question mark would replace the dot operator.

I see that there might be a conflict with the ternary operator, but maybe someone sees a possibility how to resolve it.
As you said, that will conflict with the ternary operator.

What does a?b?c:null mean? a?.b ? c : null or a ? b?.c : null?
I see two possibilities:
  1. Use another symbol (e.g. e@x or e#x). Your example would become: a@b ? c : null OR a ? b@c : null.
  2. Use parentheses when mixed with a ternary expression: (a?b) ? c : null OR a ? (b?c) : null.
Apr 22, 2014 at 10:51 PM
OlivierJ wrote:
I see two possibilities:
  1. Use another symbol (e.g. e@x or e#x). Your example would become: a@b ? c : null OR a ? b@c : null.
  2. Use parentheses when mixed with a ternary expression: (a?b) ? c : null OR a ? (b?c) : null.
  1. ? already has a meaning of conditional in C# in the ternary (?:) and null coalescing (??) operators. Why introduce another one?
  2. At this point you only need parenthesis in an expression if you need to override operator precedence. Precedence could be defined here, but it's still too much confusion.
Does ?. bother you that much, or are you just exploring alternatives?
Apr 23, 2014 at 2:41 PM
I am primarily exploring alternatives. Expression like e?.$x tend to be very cryptic and somehow remind me the programming language APL with expressions like life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}.
Apr 23, 2014 at 6:33 PM
OlivierJ wrote:
e?.$x
This shows that it's not a .$ operator but a $ operator valid only as member access - . or ?..
Apr 25, 2014 at 6:52 AM
Perhaps we need to look into a way to declare preconditions and postconditions in a different way, attributes or part of the signature? Not only would this help in Expendable's case. Also this would allow for more expressive interfaces. Bringing in the power of ContractClasses found in Code Contracts as a first class citizen.