This project is read-only.

Random ideas for a new language based on Roslyn

Topics: C# Language Design
May 30, 2014 at 2:48 PM
Edited May 31, 2014 at 6:27 AM
[The title is not completely true. It's an extension to the current C# rather than a new language. So I intend to call it Extended# for now, short for "C# with Extensions."]

Hi, before I dive into any real work soon, I'd like to get comments from the community first. Any comments would be welcome and appreciated.

Tentatively planned language features include (in random order):
  • Readonly properties
  • Extension properties
  • catch-case statements
  • Double backtick names
  • Generic type parameter type inference
  • Const type inference
  • Class field type inference
  • Jagged array type inference
  • New explicit cast syntax
  • C-like unions
  • try-block-scoped variables
  • is not operator
  • True division operator
  • Range operator
  • void keyword made optional

Readonly properties

Readonly properties will make currently proposed primary constructors unnecessary in most cases. The basic form is as follows:
class Person {
    public readonly string Name { get; set; }
}
It can have a field initializer:
class Person {
    public readonly string Name { get; set; } = "Default";
}
You can omit the setter:
class Person {
    public readonly string Name { get; } = "Default";
}
and it is identical with:
class Person {
    public readonly string Name { get; private set; } = "Default";
}
Why private set not set? If a readonly property has a non-private setter, it can be called within the instance initializer out of its class:
class Person {
    public readonly string Name { get; set; }

    public Person(string name) {
        Name = name; // OK
    }

    public void Rename(string name) {
        Name = name; // Error
    }
}

var person = new Person { Name = "Rem" }; // OK

person.Name = "Bris"; // Error
But if it has a private setter, it can be only called within the defining class.:
class Person {
    public readonly string Name { get; private set; }

    public Person(string name) {
        Name = name; // OK
    }

    public void Rename(string name) {
        Name = name; // Error
    }

    public static Person CreateInstance(string name) {
        return new Person { Name = name }; // OK
    }
}

var person = new Person { Name = "Rem" }; // Error

person.Name = "Bris"; // Error
Readonly properties are very helpful when making immutable types.

Extension properties

Extension properties are extension methods with a getter inside:
static class RubyStyleExtensions {
    public static IEnumerable<int> Times(this int count) {
        get { return Enumerable.Range(0, count); }
    }

    public static void Do(this IEnumerable<int> source, Action<int> selector) {
        foreach (var element in source)
            selector(element);
    }
}

4.Times.Do(i => Console.Write(i)); // 0123
Another example:
static class TimeSpanExtensions {
    public static TimeSpan Hours(this int hours) {
        get { return TimeSpan.FromHours(hours); }
    }
}

Thread.Sleep(7.Hours);
The above line's much easier to read than:
Thread.Sleep(TimeSpan.FromHours(7));
You can always fall back to the static method invocation syntax if name collision occurs:
Thread.Sleep(TimeSpanExtensions.Hours(7));
Note that there's no setter for extension properties.

catch-case statements

try {
   ...
} catch (Exception ex) {
case NullReferenceException:
    throw;
case InvalidOperationException;
    goto case NullReferenceException;
case ObjectDisposedException:
case IndexOutOfRangeException:
    Console.WriteLine(System.Environment.StackTrace);
    break;
default:
    throw new AnotherException(ex);
} /* No other catch blocks allowed */ finally {
   ...
}
There are some advantages with the new syntax:
  • Unlike the current multiple catch clauses, the exception case labels can be placed in any order.
  • Making exception filters is a piece of cake.

Double backtick names

This feature is taken from F#.
[Test]
public void ``Test if Equals() meets four properties``() {
    Assert.That(...);
}
It's more readable than:
[Test]
public void TestIfEqualsMeetsFourProperties() {
    Assert.That(...);
}

Generic type parameter type inference

var set = new HashSet { 1, 2, 3, 4, 5 };
is identical with
var set = new HashSet<int> { 1, 2, 3, 4, 5 };

Const type inference

const PI = 3.141596;

Class field type inference

public class NumericFormatter {
    const LocalizedFormatSpecifier = "L";
    public readonly var Default = new NumericFormatter();
}

Jagged array type inference

The current initialization form for jagged arrays are somewhat cumbersome:
int[][] matrix = new int[][] {
   new int[] { 0,1,2 },
   new int[] { 3,4,5 },
   new int[] { 6,7,8,9 }
};
It can be simplified to:
int[][] matrix = {
    { 0,1,2 },
    { 3,4,5 },
    { 6,7,8,9 }
};
or:
var matrix = int[][] {
    { 0,1,2 },
    { 3,4,5 },
    { 6,7,8,9 }
};
Warning: this shouldn't be confused with rectangular array initialization. You must let the compiler know what type of array you intend to initialize.

New explicit cast syntax

double rate = double(price) / sum;
is easier to read than the current syntax:
double rate = (double)price / sum;
Another example:
struct RichChar {
    public static explicit operator RichChar(char c) {
        return new RichChar(c);
    }
}
The old syntax:
RichChar r = (RichChar)c;
The new syntax:
RichChar k = RichChar(c);
In addition to readability, the new syntax more closely reflects the method signature:
    public static explicit operator RichChar(char c) {

C-like unions

struct SomeStruct {
    union {
        int x;
        int y;
    }
}
is much cleaner than the current:
[StructLayout (LayoutKind.Explicit)]
struct SomeStruct {
    [FieldOffset(0)] int x;
    [FieldOffset(0)] int y;
}
There are some limitations:
  • Unions are anonymous---they cannot have a name.
  • Only s/byte, u/int, u/long, float, double, char, decimal, U/IntPtr types and unions can be a member of the union.

try-block-scoped variables

How many times did you write code like the following:
FileStream stream = null;
try {
    ...
} finally {
    if (stream != null)
        stream.Close();
}

stream.Write(); // stream is still reachable although it shouldn't.
That just looks ugly. So try-block-scoped variables come to rescue:
try (FileStream stream) {
    ...
} finally {
    if (stream != null)
        stream.Close();
}

stream.Write(); // Error: stream outside its scope
Note that although this looks similar to Java 7 try-with-resources blocks, the semantics is completely different.

is not operator

if (!(person is Employee)) {
    ...
}
The above is somewhat difficult to read, but if is not operator is used:
if (person is not Employee) {
    ...
}

True division operator

How many times do you write code like this:
int a = 100;
int b = 200;
double c = (double)a / b;
That's not elegant (see the last paragraph). So use the true division operator instead:
int a = 100;
int b = 200;
double c = a double(/) b; // c = 0.5
double here is completely optional and can be omitted:
double c = a (/) b;
It's translated into
double c = double(a) / double(b);
(using the new explicit cast syntax.)

If you're a Python 3 programmer, you'll already know that the true division operator is more convenient and less error-prone than the old "untrue" division operator. I think C# should have the same one, too.

Range operator

int[] a = 1..5;
is identical with:
int[] a = { 1, 2, 3, 4, 5 };

void keyword made optional

The idea is simple: if the constructor has no return type specified since it returns nothing, why must void methods have the return type void specified? It can be omitted:
public static class Console {
    public static WriteLine() {
       ...
    }
}
That's all for now, folks!
May 30, 2014 at 5:46 PM
Readonly properties are already going into Dev14, but not exactly the same as the way you do them. We allow assignment in the constructor only, and it is done by direct assignment to the backing field rather than through a private set.

case-catch is a bit of a hack. I'd wait for a more general pattern-matching facility that includes typecase as a special case.

Type inference is a bad idea as members of types because (1) those types are useful documentation, and (2) it introduces cycles in the semantics of the language that would require a substantial restructuring of Roslyn's semantic analysis.

Your proposed new cast syntax is incompatible with existing code, as it changes the lookup rules for an invocation expression.

The try-block scoped variables and "true" division don't appear to carry their weight.

"void keyword made optional" conflicts with the syntax for constructors. Are you suggesting the syntax be made context-sensitive? If so, you'll have to disable Roslyn's incremental parser, which depends on context insensitivity. Not sure why you see a benefit to this change.

Most of the others I don't really have much of an opinion about.
May 30, 2014 at 10:11 PM
If you're going to write a new language, I'd suggest that you avoid having it look too much like C (or C#, or Java). A common source of problems when people use new languages which resemble old ones is an expectation that things which look like constructs in the older language will behave the same way. Further, if you set out to distinguish yourself from existing languages, you can look for design traps those languages have fallen into and avoid them. To some extent you'll be limited by the .NET framework, though if you offer a means of saying "pretend, when compiling this class, that a certain member of an external class is tagged with some particular attribute" it should be possible to work around many such limitations.

As a simple example, Java's rules for primitive type conversions were not thought through very well, and C# follows them more than it fixes them. I would suggest that if you want a language's warnings to have a high "true positive" rate when programmers do things they don't intend, without having an annoyingly high "false positive rate", complaining about code which would have one obvious meaning that would coincide precisely with programmer intention, your language should distinguish between "precise" and "imprecise" value types, and use such distinctions when allowing or forbidding type conversions. Rather than saying that implicit float-to-double conversions are always allowed, double-to-float conversions are always forbidden, and integer-to-float-or-double conversions should favor float when possible, it would be better to let programmers make clear when a parameter, variable, or literal is expected to represent an approximate value or a precise one, and use such distinctions in deciding what to allow or disallow. Consider, for example, field1 = Math.Sin(x); field2 = (float)Math.Sin(x);. If field1 were float, is there any way in which an implicit double-to-float conversion would violate programmer expectations? If field2 were double, what should a reader infer about the second statement's intention? If there were an "approximate-float" type which was stored as a System.Single, but was only implicitly convertible from (not to) double, the first statement would be accepted and the second (if float casts were by default considered "approximate") properly rejected.
May 30, 2014 at 10:26 PM
nmgafter wrote:
Type inference is a bad idea as members of types because (1) those types are useful documentation, and (2) it introduces cycles in the semantics of the language that would require a substantial restructuring of Roslyn's semantic analysis.
I would allow type inference in a few specific cases where the inferred type would be both only one the programmer could plausibly have intended, and the only one that would be obvious to a reader. For example, I would allow public const var n1 = 1234; and public const var n2 = 123456789012345L; but disallow public const var n3=123456789012345; [if the programmer wants the type inferred as long, make that intention clear via suffix; adding the suffix would among other things also ensure that changing the number to one that would fit into an int wouldn't change the type]. I would also allow public var foo1 = new Bar(); or public var foo2 = Bar.MemberWhoseTypeIsBar; but disallow public var foo3 = Bar.MemberWhoseTypeIsBoz;. Do you see any problems with such rules, either because they would yield "surprising" results or because evaluation would be difficult? Evaluating the type of foo2 would require examining Bar, but merely to ensure that the member in question genuinely was of type Bar; I see no possibility of circular dependency which couldn't be resolved by presuming that a static factory method would return something of the factory's own type and then later checking that the assumption held.
May 31, 2014 at 1:03 AM
supercat wrote:
nmgafter wrote:
Type inference is a bad idea as members of types because (1) those types are useful documentation, and (2) it introduces cycles in the semantics of the language that would require a substantial restructuring of Roslyn's semantic analysis.
I would allow type inference in a few specific cases where the inferred type would be both only one the programmer could plausibly have intended, and the only one that would be obvious to a reader. For example, I would allow public const var n1 = 1234; and public const var n2 = 123456789012345L; but disallow public const var n3=123456789012345; [if the programmer wants the type inferred as long, make that intention clear via suffix; adding the suffix would among other things also ensure that changing the number to one that would fit into an int wouldn't change the type]. I would also allow public var foo1 = new Bar(); or public var foo2 = Bar.MemberWhoseTypeIsBar; but disallow public var foo3 = Bar.MemberWhoseTypeIsBoz;. Do you see any problems with such rules, either because they would yield "surprising" results or because evaluation would be difficult? Evaluating the type of foo2 would require examining Bar, but merely to ensure that the member in question genuinely was of type Bar; I see no possibility of circular dependency which couldn't be resolved by presuming that a static factory method would return something of the factory's own type and then later checking that the assumption held.
I don't understand your rules. What is the difference between public var foo2 = Bar.MemberWhoseTypeIsBar; and public var foo3 = Bar.MemberWhoseTypeIsBoz; such that you allow the former and disallow the latter?

If the user writes

var n3 = Class1.n1 + Class2.n2;

then the compiler needs to figure out the types of Class1.n1 and Class2.n2 before figuring out the type of n3. The compiler code to do that has to be written in such a way that it can't get into an infinite recursion when people write recursive initialization. It is possible to do, but it is a substantial variation from the way the Roslyn implementation is organized today.
May 31, 2014 at 6:38 AM
nmgafter wrote:
I don't understand your rules. What is the difference between public var foo2 = Bar.MemberWhoseTypeIsBar; and public var foo3 = Bar.MemberWhoseTypeIsBoz; such that you allow the former and disallow the latter?
Basically, the goal would be to define the rules for public-member type inference such that"
  1. Looking solely at an initialization expression and active "using" imports, one could identify a type which the expression would have to be in order to be legal. One would have to look elsewhere to determine if the expression was legal, but one would not have to look elsewhere to know that it couldn't be any other type.
  2. The only changes which should change the type of an expression where public-member inference is legal should be those which deliberately change the type.
  3. The most useful cases are legal, to the extent possible given the above constraints.
Having the declaration public var foo = Bar.SomeMember; be legal when and only when Bar.SomeMember is of type Bar would allow a person or compiler examining the code to know that the type of the expression cannot be anything other than Bar, without having to look at Bar. With a short class name like Bar, var wouldn't help much, but not all type names are short. Some generic type names can get quite long, and there would be considerable benefits to readability if they didn't have to be repeated. It wouldn't really matter to me if the repetition is instead avoided by allowing declarations like:
SomeReallyLongTypeName = new(args); // Equivalent to ... = new SomeReallyLongTypeName(args);
and
SomeReallyLongTypeName = .SomeMember(args); // Equivalent to ... = SomeReallyLongTypeName.SomeMember(args);
but repetition of long types makes code less readable, since someone inspecting the code must examine both types to ascertain if they actually are the same.

I'm not really attached to the idea of allowing type inference with numeric types; given the way some C# types work in expressions, I think even allowing local var with general numeric-type expressions can be a bit dodgy. For example:
const int i = /* Some number */;
const uint u = 3u;
var q = u-i;
What is the type of q?
May 31, 2014 at 7:38 AM
nmgafter wrote:
Type inference is a bad idea as members of types because (1) those types are useful documentation,
IntelliSense provides me with accurate type information on any class fields whenever I need.

When I saw F# code for the first time I had a feeling that 'why are all variables, values, even functions typed let?! This language is very hard to read, so it must be bad.' Now I find myself using var everywhere. Implicit typing actually makes code easier to read.

I also find that most fields I write with the field initializer are one of:
  • Assignment from a literal:
static readonly string LocalizedFormatSpecifier = "L";
  • Assignment from a newly created instance of the same type as the field:
static readonly Dictionary<int, long[]> WeightTable = new Dictionary<int, long[]> {
In the former, it's obvious that LocalizedFormatSpecifier is a string even if the word string is omitted or is replaced with var. In the latter, Dictionary<int, long[]> is redundant.

Two simple rules would be sufficient for class field type inference:
  • The right-hand side of assignment is a literal.
  • The right-hand side of assignment is an object creation expression.
(2) it introduces cycles in the semantics of the language that would require a substantial restructuring of Roslyn's semantic analysis.
I should look into the Roslyn sources first. Thanks.
May 31, 2014 at 3:57 PM
junyoung wrote:
Two simple rules would be sufficient for class field type inference:
What do you think of my proposed rules? I forgot string literals, but otherwise how are they?
Jun 5, 2014 at 4:24 PM
+1 to "New explicit cast syntax": it's not only clearer being alone, but ESPECIALLY in multiple casts too!

MyClass(MyOtherClass(arg).Prop1).Prop2 = blah... - quite simple!
Jun 5, 2014 at 5:53 PM
While we are discussing potential language features.

Extend the VB.net LINQ Sequence operators?
To begin with let LINQ expressions potentially start with a Let
An To operator which generate a sequence of value from To to (Maybe with an optional Step stepsize )
Dim missingOdds = Let a = From w In i.Split(" "c)
                          Let n = Int32.Parse(w) ' I know this isn't strictly needed.
                          Select n ' = Int32.Parse(w)
                          Order By n
                  From x In (a.Min To a.Max) ' <== Eg -10 To 100 Step 2
                  Where Not a.Contains(x)
                  Where x Mod 2 = 1
Console.WriteLIne( String.Join(","c,missingOdds))
Which is a solution to this CodeGolf challenge.

Be able to express that you also want the index as well.
  From (value,index) In myEnumerable
  Where IsPrime( index ) ' Yields (value, index) and not just (value)
  Select value ' This would be required in this usage.
Also let vb.net "infer implied" AddressOf
 .Split(" "c).Select( Int32.Parse ) ' Infers AddressOf 
An alias for the Identity function
@ ==> Function(__) __

Think about the reduced typing involved for Join