This project is read-only.

Remove requirement for semicolon (;) at end of line

Topics: C# Language Design
Nov 3, 2014 at 9:12 PM
Edited Nov 3, 2014 at 9:13 PM
You did that with TypeScript and this is great. For me putting semicolon as end of lines is very irritating, and nonsnese ceremony borrowed from C, for unknown reasons. Do some devs still want to write program in one line ?
Nov 3, 2014 at 10:03 PM
I would suggest that good languages should follow one of three patterns:
  1. Every statement has a clear and unambiguous delimiter at the end of it.
  2. The end of a line marks the end of a statement in the absence of an explicit continuation indicator on that line.
  3. The end of a line marks the end of a statement in the absence of an explicit continuation indicator on the next line.
Even though some languages try to "guess" when statements should end in the absence of the above rules, such behavior increases the likelihood that a program will behave in a fashion contrary to programmer intent. Even if statement delimiters are "redundant", they represent a form of redundancy which increases the likelihood that various mistakes will yield a program that won't compile, instead of a program that will compile cleanly but will behave incorrectly.
Nov 3, 2014 at 11:06 PM
Edited Nov 3, 2014 at 11:07 PM
Let’s put aside the discussion on whether we want to do this for a moment. Obviously there is a long discussion to be had there.

Suppose we were to do this:

C# was not designed to have the semicolon delimiter be optional, so unlike Visual Basic or Python there would be cases where this introduces ambiguity in the code's meaning. The correct thing to do here would be to have the compiler identify ambiguous cases and show a warning. Unfortunately nearly everyone compiles with warnings as errors, so this would essentially be a breaking change. It’s not that C# never introduces breaking changes, but when we do the break needs to be mitigated as much as possible and the potential benefit needs to be huge relative to the size of the break.

My point is: even if we all collectively agreed to make the semicolon delimiter optional, the risk of breaking existing code or the expense of mitigating that risk is significant enough to cause us to rethink things. That doesn't mean it can't be done, just that doing it is going to be expensive. In my opinion, even if we had consensus and everyone wanted to do this, the bang-for-your-buck would not be significant enough compared to other potential features.
Nov 4, 2014 at 1:13 AM
codefox wrote:
You did that with TypeScript and this is great. For me putting semicolon as end of lines is very irritating, and nonsnese ceremony borrowed from C, for unknown reasons. Do some devs still want to write program in one line ?
The nature in which statements are terminated is something which needs to be designed into a language from the very beginning in order to prevent ambiguity. Typescript inherited it from Javascript where semicolons are optional because it was thought that programmers would forget them. The interpreter inserts the semicolon where it feels is appropriate based on a long list of unmemorable parsing rules. Unfortunately, like most things in Javascript, this was poorly thought through and despite it being an aspect of the very initial design there are still common ambiguous scenarios and omitting them is considered bad practice and an error by default by every Javascript code-quality tool that exists.

And frankly, what's wrong with semicolons? It's not like they're difficult to enter and the compiler/IDE is very quick to alert you as to when they are missing. What's next for the chopping block, braces for scope? There are many languages targeting the CLR, I'd probably recommend using a different one with a grammar that you'd prefer. Trying to retrofit it into C# would be a lot of work for absolutely no gain.
Nov 4, 2014 at 5:16 PM
there are still common ambiguous scenarios and omitting them is considered bad practice
Any worthy references would be veeery helpful here.

I do no semicolon for as long as I remember and any "readable" - by means of "common" sense - code would never had to include semicolons.

But I do agree with jmarolf completely.
Nov 4, 2014 at 5:50 PM
arekbal wrote:
there are still common ambiguous scenarios and omitting them is considered bad practice
Any worthy references would be veeery helpful here.

I do no semicolon for as long as I remember and any "readable" - by means of "common" sense - code would never had to include semicolons.

But I do agree with jmarolf completely.
What does this JavaScript code do?
// define a function
var fn = function (foo) {
    alert(foo)
} 

// then execute some code inside a closure
(function () {
    fn('Hello World')
})()
HINT: Probably not what you think it does.

Or in other words, the 50 milliseconds you gained by not typing semicolons just led to minutes/hours of debugging for you or some other poor guy who inherits your code.

Adapted from this answer.
Nov 4, 2014 at 8:10 PM
arekbal wrote:
there are still common ambiguous scenarios and omitting them is considered bad practice
Any worthy references would be veeery helpful here.

I do no semicolon for as long as I remember and any "readable" - by means of "common" sense - code would never had to include semicolons.

But I do agree with jmarolf completely.
Item 3 in the list of "Awful Parts" in Douglas Crockford's book "JavaScript: The Good Parts" is semicolon insertion.

JSLint and JSHint both issue warnings by default when semicolons are omitted, even where not ambiguous.

The prototypical example of where it goes wrong is the following:
function foo()
{
    return
    {
        "x": "y"
    }
}
Transpilers like TypeScript do attempt to compensate for these concerns, and you've probably already noticed that the JavaScript it generates does contain those missing semicolons. TypeScript balks at the above example, but it can still be tricked through the following contrived example:
function getEmptyObject()
{
    return
    {
    }
}
Nov 4, 2014 at 10:10 PM
Edited Nov 4, 2014 at 10:14 PM
@jmarolf
This can implemented similar to VB compiler options: globally and per file. Maybe in future you will find other purposes for compiler options implemented this way :)

Without semicolons code looks bit better and cleaner. Also when some devs often switch between multiple languages, writing these semicolons begin to irritate. Devs that write in C# most of their time, are more tolerant and do it automatically with low awareness,
Nov 5, 2014 at 12:15 AM
@codefox
Yes, we could do that. Some people don't like compiler options to change language semantics the way VB does, that is a whole long discussion we could have. Honestly there are many implementations that would work. The expense here is not the development work but the specification and testing. This is a very drastic change to the grammar. I have nothing against this idea by itself. I think when most people think about how to improve a programming language they say "What am I repeating a lot for no reason?" and semicolons fit the bill here. Unfortunately removing them is more difficult that you would initially think and I wanted to make sure everyone was aware going in that this isn’t a small feature.
Nov 5, 2014 at 12:39 AM
jmarolf wrote:
@codefox
Yes, we could do that. Some people don't like compiler options to change language semantics the way VB does, that is a whole long discussion we could have.
I really wish more consideration would be given to the concept of options. I don't like the "optional semicolon" notion at all, but there are many situations where the compiler requires typecasts that would clutter up code while offering little or no real semantic benefit, and many others where the compiler will perfectly happily accept code which is very likely wrong. Rather than try to have one set of rules which must be applied in all cases, it would be helpful if code could request what rules should apply when.

Consider, for example, the code
byteArray1[i] = (byte)(byteArray2[i] & 127);
longVariable = intVariable + 1234567890;
The semantics of the ampersand operator imply that if byteArray2[i] fit within the range 0..255, the result of the ampersand operator will do so as well. To have the result of byteValue & otherNumericType yield a byte rather than an int could potentially be a breaking change (not terribly likely, but it could affect overload selection elsewhere), but the typecast in the former expression really adds no value. In the latter expression, however, if intVariable happens to equal e.g. 2000000000, the compiler would be allowed to throw an exception or to store -1060399406, but would not be allowed to store 3234567890, nor squawk and demand that the programmer either cast one of the operands to the addition to Int64 or else cast the result to Int32 before assigning it to a longer type.

Changing the type system so that a cast was required on the second line above but not the first would be a breaking change unless it were optional, but if compilers are supposed to serve programmers rather than the other way round, language design shouldn't be dictated by the resource limitations of compilers running on 1970s hardware. Once upon a time C# was designed to be a "simple" language in the spirit of C and Java, but those days are long gone.
Nov 5, 2014 at 2:39 PM
It is also down to the that the compiler interprets the characters 127 as token for an integer, not as a token for possible numeric types.
  • UByte
  • Byte
  • UInt16
  • Int16
  • UInt32
  • Int32
  • UInt64
  • Int64
  • Single
  • Double
  • Decimal
If the compiler instead chose the smallest numeric type into which the value could fit, then it would break for the majority case, where the user just needs an Integer,
byteArray1[i] = (byteArray2[i] & (byte)127);
Another possible is the extend the type literal suffixes to include byte.
byteArray1[i] = (byteArray2[i] & 127_SB);
Nov 5, 2014 at 3:06 PM
AdamSpeight2008 wrote:
It is also down to the that the compiler interprets the characters 127 as token for an integer, not as a token for possible numeric types.

If the compiler instead chose the smallest numeric type into which the value could fit, then it would break for the majority case, where the user just needs an Integer,
byteArray1[i] = (byteArray2[i] & (byte)127);
Another possible is the extend the type literal suffixes to include byte.
byteArray1[i] = (byteArray2[i] & 127_SB);
The literal doesn't matter. If the code had been written using two byte variables the cast would still be required.

I'm not certain of this but I think this behavior is inherited from IL where all of the binary and arithmetic opcodes don't work on integral data types smaller than int32. The widening operation is implicit for both operands but the C# compiler has to emit a conv.u1 after the and opcode in order to truncate the result back into an int8.
Nov 5, 2014 at 5:32 PM
Edited Nov 5, 2014 at 5:47 PM
Halo_Four wrote:
I'm not certain of this but I think this behavior is inherited from IL where all of the binary and arithmetic opcodes don't work on integral data types smaller than int32. The widening operation is implicit for both operands but the C# compiler has to emit a conv.u1 after the and opcode in order to truncate the result back into an int8.
The IL does not have any concept of "values" smaller than 32 bits, and thus requires that anything smaller be promoted to that size, but that does not imply that a language's type system needs to follow suit. A language could support a much larger set of primitive types--most of them only available as compiler internals--whose representations would correspond with the .NET Framework types but whose promotion and implicit-conversion rules differed. For example, the result of performing a & where at least one operand was Byte could yield a compile-time type of "Int32 which should be implicitly convertible to Byte", while the result of performing + on two operands of type Int32 could yield a compile-time type of "Int32 which should not be convertible (implicitly or explicitly) to anything larger" [conversion to something larger would require first doing something that visibly clipped the precision of the result].

Nothing in the design of CIL would prevent a compiler from regarding byte1 = byte2 & intVal; as synonymous with byte1 = (byte)(byte2 & int1;, nor would it prevent a compiler from squawking if code was written as long1 = int1+int2; rather than either long1 = (int)(int1 + int2); [if one wanted the semantics presently implied by the former] or long1 = (long)int1 + int2; [if one wanted arithmetically-correct semantics when the result couldn't fit in int]. Nor, for that matter, would it prevent the compiler from regarding the return type of "Int32+Int32" as "Int64 which is implicitly convertible to Int32". Note that such an approach could be especially helpful if there were types UInt31 and UInt62 [which would be the semantically-correct type for things like indices, count values] since even if integers auto-promoted to a "back-convertible Int64", a compiler which was given code int1 = uint2+uint3+uint4; could recognize that the only way the result would fit in a UInt31 would be if all the intermediate results did so as well, and consequently the system could use overflow-checked 32-bit addition rather than using 64-bit addition and doing a range-checked conversion back to 32 bits.

PS--I'd also like to see languages support a variety of floating-point modes. Given a statement like:
float f5 = f1*f2+f3*f4;
there are a variety of things a programmer could want to have happen:
  1. Perform the computation using the best available accuracy supported by hardware and round to float.
  2. Perform the computation with intermediate results accurately rounded to double accuracy and round to float.
  3. Round every intermediate computed result to float.
  4. Use whatever combination of intermediate rounding is fastest, even if it happens to be float f5 = (float)((float)(f1*f2) + (double)f3*f4) [the last choice is not contrived, since some processors have instructions which adds a float to the product of two floats, but does the intermediate computation at higher precision].
In many cases, it would be much more useful--for both writers and readers--to have code specify which approach should be used for floating-point computations than to require that code use typecasts all over the place to force a particular behavior. Personally, I think that outside of performance-critical situations #1 should be the default approach, but for it to work there must be a data type capable of holding any intermediate results. IMHO, .NET should have included a 16-byte TempDouble type whose internal format and precision would be unspecified, but which would match whatever was used by the hardware (default would be extended precision on x86 or double precision on x64, but configuration options could allow either style to be used on either platform).
Nov 5, 2014 at 6:12 PM
supercat wrote:
Halo_Four wrote:
I'm not certain of this but I think this behavior is inherited from IL where all of the binary and arithmetic opcodes don't work on integral data types smaller than int32. The widening operation is implicit for both operands but the C# compiler has to emit a conv.u1 after the and opcode in order to truncate the result back into an int8.
The IL does not have any concept of "values" smaller than 32 bits, and thus requires that anything smaller be promoted to that size, but that does not imply that a language's type system needs to follow suit. A language could support a much larger set of primitive types--most of them only available as compiler internals--whose representations would correspond with the .NET Framework types but whose promotion and implicit-conversion rules differed. For example, the result of performing a & where at least one operand was Byte could yield a compile-time type of "Int32 which should be implicitly convertible to Byte", while the result of performing + on two operands of type Int32 could yield a compile-time type of "Int32 which should not be convertible (implicitly or explicitly) to anything larger" [conversion to something larger would require first doing something that visibly clipped the precision of the result].
Of course. I agree that the compiler could behave in such a manner where it is obviously safe and wouldn't result in a truncation by silently emitting the conv.u1 operand. VB.NET already does this.
Nothing in the design of CIL would prevent a compiler from regarding byte1 = byte2 & intVal; as synonymous with byte1 = (byte)(byte2 & int1;, nor would it prevent a compiler from squawking if code was written as long1 = int1+int2; rather than either long1 = (int)(int1 + int2); [if one wanted the semantics presently implied by the former] or long1 = (long)int1 + int2; [if one wanted arithmetically-correct semantics when the result couldn't fit in int]. Nor, for that matter, would it prevent the compiler from regarding the return type of "Int32+Int32" as "Int64 which is implicitly convertible to Int32".
Doable, but I think that the compiler would have to emit the following for that to work:
.locals init (
    [0] int32 x,
    [1] int32 y,
    [2] int32 result
)
// int result = x + y;
ldloc.0 
conv.i8 
ldloc.1 
conv.i8 
add 
conv.ovf.i4
stloc.2 
Interestingly enough, when doing this manually by casting to long and back down to int the C# compiler emits conv.i4 instead so it just truncates the extra bytes.
Nov 5, 2014 at 7:08 PM
Halo_Four wrote:
Doable, but I think that the compiler would have to emit the following for that to work:
I would think that a compiler could include logic to detect situations where the conv.ovf.i4 instruction would fail in the same cases as a 32-bit add.ovf and omit the extra type conversions. If three values were being added rather than two, then the semantics of int1 = (int)(int2+int3)+int4; would match those of int1 = (long)int2+int3+int4; in unchecked contexts only; in some cases, programmers might regard semantics of the former as adequate and prefer it to the latter which would on x86 be slower, but would work even in cases where the former might fail (incidentally, I wouldn't be surprised if the latter was actually be faster on x64, since it would require only one "overflow check" rather than two).

My main point is that in cases where code will be working with large numbers, it may be necessary for semantic correctness to have computations that accept Int32 values as inputs to perform intermediate calculations using Int64; having an option to make promotions to Int64 automatic (while allowing the results of such implicit promotions to be implicitly converted back to Int32 after checking overflow) may result in code which is slower than would be code which did everything as Int32, but could yield semantically-correct results in cases where code using Int32 would fail. Within such contexts, saying that casts of intermediate computations to Int32 may be necessary to achieve maximum speed, and omission of such casts will likely result in a program which runs slightly slower than ideal, would seem safer than saying that casts to Int64 are sometimes necessary to achieve correctness, and omission of such casts may result in programs that work 99% of the time but can fail in spectacular fashion.