C# Language Design Notes for May 21, 2014 (Part II)

Topics: C# Language Design
Coordinator
May 24, 2014 at 10:26 PM
This is Part II. Part I is here.

String interpolation

There have been a number of questions around how to add string interpolation to C#, some a matter of ambition versus simplicity, some just a matter of syntax. In the following we settle on these different design aspects.

Safety

Concatenation of strings with contents of variables has a long tradition for leading to bugs or even attack vectors, when the resulting string is subsequently parsed up and used as a command. Presumably if you make string concatenation easier, you are more vulnerable to such issues – or at least, by having a dedicated string interpolation features, you have a natural place in the language to help address such problems.

Consequently, string interpolation in the upcoming EcmaScript 6 standard allows the user to indicate a function which will be charged with producing the result, based on compiler-generated lists of string fragments and expression results to be filled in. A given trusted function can prevent SQL injection or ensure the well-formedness of a URI.

Conclusion

We don’t think accommodating custom interpolators in C# is the sweet spot at this point. Most people are probably just looking for simpler and more readable syntax for filling out holes in strings. However, as we settle on syntax we should keep an eye on our ability to extend for this in the future.

Culture

In .NET there’s a choice between rendering values in the current culture or an invariant culture. This determines how common values such as dates and even floating point numbers are shown in text. The default is current culture, which even language-recognized functions such as ToString() make use of.

Current culture is great if what you’re producing is meant to be read by humans in the same culture as the program is run in. If you get more ambitious than that with human readers, the next step up is to localize in some more elaborate fashion: looking up resources and whatnot. At that point, you are reaching for heavier hammers than the language itself should probably provide.

There’s an argument that when a string is produced for machine consumption it is better done in the invariant culture. After all, it is quite disruptive to a comma-separated list of floating point values if those values are rendered with commas instead of dots as the decimal point!

Should a string interpolation feature default to current or invariant culture, or maybe provide a choice?

Conclusion

We think this choice has already been made for us, with the language and .NET APIs broadly defaulting to current culture. That is probably the right choice for most quick-and-easy scenarios. If we were to accommodate custom interpolators in the future, there could certainly be one for culture-invariant rendering.

Syntax

The general format is strings with “holes”, the holes containing expressions to be “printed” in that spot. We’d like the syntax to stress the analogy to String.Format as much as possible, and we therefore want to use curly braces {…} in the delimiting of holes. We’ll return to what exactly goes in the curly braces, but for now there is one central question: how do we know to do string interpolation at all?

There are two approaches we can think of:
  1. Provide new syntax around the holes
  2. Provide new syntax around the string itself
To the first approach, we previously settled on escaping the initial curly brace of each hole to mean this was a string interpolation hole, and the contents should be interpreted as expression syntax:
"Hello, \{name}, you have \{amount} donut{s} left."
Here, name and amount refer to variables in scope, whereas {s} is just part of the text.
This has a few drawbacks. It doesn’t look that much like a format string, because of the backslash characters in front of the curlies. You also need to visually scan the string to see if it is interpolated. Finally there’d be no natural place for us to indicate a custom interpolation function in the future.

An example of the second approach would be to add a prefix to the string to trigger interpolation, e.g.:
$"Hello, {name}, you have {amount} donut\{s\} left."
Now the holes can be expressed with ordinary braces, and just like format strings you have to escape braces to actually get them in the text (though we are eager to use backslash escapes instead of the double braces that format strings use). You can see up front that the string is interpolated, and if we ever add support for custom interpolators, the function can be put immediately before or after the $; whichever we decide:
LOC$"Hello, {name}, you have {amount} donut\{s\} left."
SQL$"…"
URI$"…"
The prefix certainly doesn’t have to be a $, but that’s the character we like best for it.

We don’t actually have to do it with a prefix. JavaScript is going to use back ticks to surround the string. But prefix certainly seems better than yet another kind of string delimiter.

Conclusion

The prefix approach seems better and more future proof. We are happy to use $. It wouldn’t compose with the @ sign used for verbatim strings; it would be either one or the other.

Format specifiers

Format strings for String.Format allow various format specifiers in the placeholders introduced by commas and colons. We could certainly allow similar specifiers in interpolated strings. The semantics would be for the compiler to just turn an interpolated string into a call to String.Format, passing along any format specifiers unaltered:
$"Hello, {name}, you have {amount,-3} donut\{s\} left."
This would be translated to
String.Format("Hello, {0}, you have {1,-3} donut{{s}} left.", name, amount)
(Note that formatting of literal curlies needs to change if we want to keep our backslash escape syntax, which, tentatively, we do).
The compiler would be free to not call String.Format, if it knows how to do things more optimally. This would typically be the case when there are no format specifiers in the string.

Conclusion

Allow all format specifiers that are allowed in the format strings of String.Format, and just pass them on.

Expressions in the holes

The final – important – question is which expressions can be put between the curly braces. In principle, we could imagine allowing almost any expression, but it quickly gets weird, both from a readability and from an implementation perspective. What if the expression itself has braces or strings in it? We wouldn’t be able to just lex our way past it (when to stop?), and similarly a reader, even with the help of colorization, would get mightily confused about what gets closed out when exactly.

Additionally the choice to allow format specifiers limits the kinds of expressions that can unambiguously precede those.
$"{a as b ? – c : d}" // ?: or nullable type and format specifier?
The other extreme is to allow just a very limited set of expressions. The common case is going to be simple variables anyway, and anything can be expressed by first assigning into variables and then using those in the string.

Conclusion

We want to be quite cautious here, at least to begin with. We can always extend the set of expressions allowed, but for now we want to be close to the restrictive extreme and allow only simple and dotted identifiers.
May 24, 2014 at 11:13 PM
Edited May 25, 2014 at 12:17 AM
One of the issues is String Interpolation and String.Format is that the argument parameter inside the formatstring are not validated at compile-time, but at run-time and thus throw runtime exceptions. What if I rename the variable name to NomDePlume and I forgot about altering it in the string, Silent error till I encounter it at runtime.

Expressions in the hole nope! just Named Identifiers.

EscapedBraces stick with {{ and }} please.
May 25, 2014 at 1:55 AM
Nice.

I know that it's probably post C#6 but I'd love to see the thoughts regarding the custom interpolation prefixes, e.g., do they correspond to instances of some interface or methods that meet some specific signature? IFormatProvider seems to get you most of the way there but perhaps something more structured like what had been suggested in the forums.
May 25, 2014 at 7:50 AM
Edited May 25, 2014 at 7:58 AM
First of all, $ prefix is perfect for the job, I'm glad you decided against backslash characters in normal string.

Any word on boxing of value types in string interpolation? Simply replacing the call with string.Fromat(string, params[] object) would cause boxing, which could be avoided with such feature.

I feel like your proposed syntax for passing interpolation function would be easy to implement and could very well be killer feature for string interpolation. It would be nice to have in C#6 - it seems there could be some simple and easy to implement way do that.
May 25, 2014 at 11:36 AM
Edited May 25, 2014 at 11:37 AM
I suggest to do it Mustache way
var person = new { name="mustache",amount=5 };
Console.Write("Hello, {{name}}, you have {{amount-3}} donut's left.".ToString(person));
//Hello, mustache, you have 2 donut's left

var people = new [] { new { name="mustache", amount = 5 }, new { name="mad", amount=3}};
IEnumerable<string> peopleString = "Hello, {{name}}, you have {{amount-3}} donut's left.".ToString(people);
foreach(var p in peopleString){
Console.Write(p);
}
//Hello, mustache, you have 2 donut's left
//Hello, mad, you have 0 donut's left
May 25, 2014 at 2:16 PM
Edited May 26, 2014 at 8:33 AM
Can you support newlines in all string literals ("", $"")?
It would remove most needs for @"".
May 25, 2014 at 5:06 PM
Edited May 26, 2014 at 7:30 AM
1) Use {{ and }} to escape curly braces to be consistent with String.Format.

2) Why can't you combine $ and @? Why shouldn't it be possible to have multiline interpolated strings? For instance, when writing test cases for Roslyn, you typically use a verbatim string to write the C# code that you're going to test, especially if it's a full class declaration, for instance. Why shouldn't I be able to interpolate the type of a field of the class declared in the string, for instance?

3) Why do you default to the current culture? Actually, I would have expected it to default to the invariant one: The feature cannot be used with localized text and this is where you need the culture-specific stuff. It will mostly be used for debug messages or to generate strings of a fixed format that will later be used by other tools, in which case you need the string to be culture invariant.
May 26, 2014 at 9:10 AM
Edited May 26, 2014 at 9:11 AM
\{name} is hideous, and while I originally preferred simply using the $name prefix approach, {name} does make expressions with spaces and formatting nicer :)

The ability to create your own formatters e.g. SQL$"Select {DropDataBase} From X" sounds awesome. If possible, I would like to see this implemented in the first release rather than having to wait till next version. One potential problem I do see is how one adds custom parameters - e.g. wants to change culture - to the generator. And what is the nature of $... right now it's purely compiler magic, but with user-defined generators is it an operator or some other magic?

Not combining $ and @ seems to be a tad arbitrary - what difficulties are you trying to avoid?

Expressions in the holes - I understand wanting to limit these to simple set initially. Would be interesting to see more about what should be allowed - would you allow Method calls for example? That said, if there is ambiguity why not just tell us and let us use brackets to correct? {a as b ? – c : d} can easily become {(a as b ? – c) : d} or {(a as b ? – c : d)} or even {(a as b ?) – c : d} or even more {a as (b ?) – c : d} or finally {a as (b ? – c : d)} (assuming my syntax parser is functional today :P). I'm a fan of giving power to the people, because I've some really interesting patterns develop thanks to language abuse (e.g.misusing using).
May 26, 2014 at 2:55 PM
A vote fore {{ and }} escaping consistency with string.Format here too.
May 27, 2014 at 11:54 AM
AdamSpeight2008 wrote:
One of the issues is String Interpolation and String.Format is that the argument parameter inside the formatstring are not validated at compile-time, but at run-time and thus throw runtime exceptions. What if I rename the variable name to NomDePlume and I forgot about altering it in the string, Silent error till I encounter it at runtime.
After the compiler expands the $string to a string.Format(...), you would definitely get a compile-time error about name not being found.
Additionally the IDE would provide support for renaming variable uses in strings.
May 27, 2014 at 8:25 PM
String Interpolation could become my favorite feature of C# 6. :) The aspects you have settled on all sound pretty reasonable to me, with the exception that, like some others, I'd probably prefer {{ for escaping. What are the reasons for you preferring \{?
May 28, 2014 at 4:01 AM
Edited May 28, 2014 at 4:01 AM
My thinking of why they prefer \{ is that in C# string literals \ is the escape character eg \n.

Which is kind of reasonable if String-Interpolation is going to be a C# specific feature, but my personal preference is {{ as that is the exist way escape the brace of string.format, console.write and console.writeline.


eldritchconundrum wrote:
After the compiler expands the $string to a string.Format(...), you would definitely get a compile-time error about name not being found.
Additionally the IDE would provide support for renaming variable uses in strings.
You've missed my point of that the error would be in the expanded code, not within the string of the String-Interpolation.

Hence why I created the CodePlex project: String.Format Diagnostic
 OpeningBrace ::= "{"
 ClosingBrace ::= "}"
        Comma ::= ","
        Colon ::= ":"
        Digit ::= "0"..."9"
       Digits ::= Digit+
     ArgIndex ::= Digits

    IndexPart ::= ArgIndex | Identifier //* Additional possibility *//

AlignmentPart :: Comma Minus? Digits
   FormatPart ::= Colon Formatting

    StringArg ::= OpeningBrace IndexPart AlignmentPart? FormatPart? ClosingBrace 
Which just requires an additional rule adding to the IndexPart. Then additional IssueReport for errors in the Identifier. If you can also ascertain the type of the argument being referenced (Indexed Arg / Identifier) you can further additional diagnostic checks to validate to FormatPart is valid for the type.
May 28, 2014 at 4:31 AM
AdamSpeight2008 wrote:
eldritchconundrum wrote:
After the compiler expands the $string to a string.Format(...), you would definitely get a compile-time error about name not being found.
Additionally the IDE would provide support for renaming variable uses in strings.
You've missed my point of that the error would be in the expanded code, not within the string of the String-Interpolation.

Hence why I created the CodePlex project: String.Format Diagnostic


Which just requires an additional rule adding to the IndexPart. Then additional IssueReport for errors in the Identifier. If you can also ascertain the type of the argument being referenced (Indexed Arg / Identifier) you can further additional diagnostic checks to validate to FormatPart is valid for the type.
I think you're conflating interpolation with string.Format. The named identifier doesn't exist at runtime. The following code would not successfully compile:
public void Foo()
{
    string nomDePlume = "My Name";
    string formatted = $"Hello {name}!";
}
The compiler is going to attempt to expand it out into the following, which wouldn't compile even if the compiler didn't do a first-pass check on the referenced symbols, which it will in order to provide an accurate text span as to where the error occurred.
public void Foo()
{
    string nomDePlume = "My Name";
    string formatted = string.Format("Hello {0}!", name);
}
And since it's not possible to manually mess with the indexing of the interpolated string you couldn't have a situation where the parameters would mismatch the indexed holes.

Am I misinterpreting what you're saying? You seem to imply that the above sample code would compile and that an error/exception wouldn't occur until runtime, which is not true.
May 28, 2014 at 5:06 AM
Edited May 28, 2014 at 5:12 AM
I my thought was if it was translated similar to the following. Just expand the capabilities of String.Format.
string formatted = String.Format( "Hello {nomDePlune}! I'm {1}", myname); //* deliberate errors *//
As added benefit you also get this capability in Console.Write and Console.WriteLine.
Which isn't (currently) checked at compile time but exception is thrown at runtime, because the format string is just a string with magic contents.

If you did include another type of string, which would mean that C# has 3 different types of string.
""  //* Normal String        *//
@"" //* Verbertam String     *//
$"" //* Interpolation String *//
VB.net manages with the just one string type.

Also if you are going to have other magic strings, why not allow other types to pseudo-inherit from string? URI, URLs.

My version is compatible with C# and VB.net, but requires diagnostics to produce compile-time warnings.
It also demonstrates you don't require an additional string type, you just need better diagnostic tools.
May 28, 2014 at 12:26 PM
AdamSpeight2008 wrote:
I my thought was if it was translated similar to the following. Just expand the capabilities of String.Format.
string formatted = String.Format( "Hello {nomDePlune}! I'm {1}", myname); //* deliberate errors *//
As added benefit you also get this capability in Console.Write and Console.WriteLine.
Which isn't (currently) checked at compile time but exception is thrown at runtime, because the format string is just a string with magic contents.

If you did include another type of string, which would mean that C# has 3 different types of string.
""  //* Normal String        *//
@"" //* Verbertam String     *//
$"" //* Interpolation String *//
VB.net manages with the just one string type.

Also if you are going to have other magic strings, why not allow other types to pseudo-inherit from string? URI, URLs.

My version is compatible with C# and VB.net, but requires diagnostics to produce compile-time warnings.
It also demonstrates you don't require an additional string type, you just need better diagnostic tools.
Oh I see, you're proposing a different implementation and complaining about Roslyn's non-support for it? It seems that the team has fairly locked down this design so I don't know how useful of a conversation that is.

Your method would require BCL changes and still puts the onus on the user to pass an object compatible with named holes, either a dictionary or a type with those properties. Otherwise the compiler would still be required to wire up arguments. If that was the case you're back in the same position as you can't anticipate what method you'd be calling so the compiler couldn't intelligently wire up the parameters, and existing strings could contain that syntax which would break existing code. The current proposal is concise. It is limited currently to String.Format but since Console.WriteLine, StringBuilder.WriteLine, etc. can all accept an already formatted string that limitation doesn't prevent its use in those scenarios at all.

As for the number of string types, I think it's not only reasonable but expected to add a separate type specifically for interpolation. VB.NET will get a separate string type as well. It's the only reasonable way to be able to add the compiler feature without breaking existing code. VB.NET only has one string today because VB.NET and it's BASIC heritage does not have the concept of escape sequences, so a verbatim string wouldn't make any sense. C# got verbatim strings as a simple way to define paths and regular expressions since doing so otherwise is an absolute pain, and it makes more sense to allow that separate string type than to add a regular expression syntax which can only apply to regular expressions. Note that these string types aren't actually string types, they're simply compiler candy. .NET strings have no concept of escape sequences or quote-doubling at all.
May 28, 2014 at 1:54 PM
madst wrote:

Expressions in the holes

The final – important – question is which expressions can be put between the curly braces. In principle, we could imagine allowing almost any expression, but it quickly gets weird, both from a readability and from an implementation perspective. What if the expression itself has braces or strings in it? We wouldn’t be able to just lex our way past it (when to stop?), and similarly a reader, even with the help of colorization, would get mightily confused about what gets closed out when exactly.

Additionally the choice to allow format specifiers limits the kinds of expressions that can unambiguously precede those.
$"{a as b ? – c : d}" // ?: or nullable type and format specifier?
The other extreme is to allow just a very limited set of expressions. The common case is going to be simple variables anyway, and anything can be expressed by first assigning into variables and then using those in the string.

Conclusion

We want to be quite cautious here, at least to begin with. We can always extend the set of expressions allowed, but for now we want to be close to the restrictive extreme and allow only simple and dotted identifiers.
I think it would be important that at least simple property/field accessor expressions are permitted in the holes. Even the example on the Language feature implementation status demonstrates putting expressions like p.First and, especially combined with LINQ, I think that will be an extremely common use case.
Jun 4, 2014 at 8:54 PM
If you really believe that CurrentCulture is a better choice, then you should just drop the string interpolation feature altogether. For one thing, as noted above, CurrentCulture is generally only useful for UI strings - the one displayed to the user - and as those are meant to be localizable, they should be using indexed placeholders instead anyway so that they can be reordered for localization. For another, as string interpolation is a convenience feature, it will be used often by people, and by forcing it to be CurrentCulture you open the Pandora's box of numerous developers with English (US) locales being clueless about the existence of other decimal and date separators, and writing code that parses or outputs strings intended for machine consumption that work in US locales, but break everywhere else. We have already seen that in action with VB6, where the problems were so severe that a bug having to do with improper use of locales made it into the installer for the product itself.

Defaulting to CurrentCulture for anything (and, arguably, picking the default at all, instead of forcing the developer to choose and think about the consequences of his choice for every particular use case) has been a long-running mistake in the design of the .NET libraries. It cannot truly be fixed for legacy compat reasons, but please don't contribute to it.
Jun 5, 2014 at 12:43 PM
pminaev wrote:
If you really believe that CurrentCulture is a better choice, then you should just drop the string interpolation feature altogether. For one thing, as noted above, CurrentCulture is generally only useful for UI strings - the one displayed to the user - and as those are meant to be localizable, they should be using indexed placeholders instead anyway so that they can be reordered for localization. For another, as string interpolation is a convenience feature, it will be used often by people, and by forcing it to be CurrentCulture you open the Pandora's box of numerous developers with English (US) locales being clueless about the existence of other decimal and date separators, and writing code that parses or outputs strings intended for machine consumption that work in US locales, but break everywhere else. We have already seen that in action with VB6, where the problems were so severe that a bug having to do with improper use of locales made it into the installer for the product itself.

Defaulting to CurrentCulture for anything (and, arguably, picking the default at all, instead of forcing the developer to choose and think about the consequences of his choice for every particular use case) has been a long-running mistake in the design of the .NET libraries. It cannot truly be fixed for legacy compat reasons, but please don't contribute to it.
I think that we'd have to look at the target audience of the feature and their use cases. My opinion would be that the vast majority of those use cases would involve user-facing strings, either in the form of actual UI elements or exception messages. I completely agree with you that this is an anti-pattern due to the inability to internationalize the result. However, I also understand that the majority of software written is never intended to be internationalized, at all. I'm talking of the internal business applications written by small businesses that don't (yet) have an international presence. They're the ones that want interpolation in order to create these simple formatted strings.

Other programming languages that include interpolation seem to take the same approach. PHP uses the locale of the current script. I can't quite tell what Swift does based on the docs but the few examples that include English text and floating point values do use a period as the decimal mark. I suspect that they rely on the current culture as well.
Jul 3, 2014 at 9:41 AM
pminaev wrote:
If you really believe that CurrentCulture is a better choice, then you should just drop the string interpolation feature altogether. For one thing, as noted above, CurrentCulture is generally only useful for UI strings - the one displayed to the user - and as those are meant to be localizable, they should be using indexed placeholders instead anyway so that they can be reordered for localization. For another, as string interpolation is a convenience feature, it will be used often by people, and by forcing it to be CurrentCulture you open the Pandora's box of numerous developers with English (US) locales being clueless about the existence of other decimal and date separators, and writing code that parses or outputs strings intended for machine consumption that work in US locales, but break everywhere else. We have already seen that in action with VB6, where the problems were so severe that a bug having to do with improper use of locales made it into the installer for the product itself.

Defaulting to CurrentCulture for anything (and, arguably, picking the default at all, instead of forcing the developer to choose and think about the consequences of his choice for every particular use case) has been a long-running mistake in the design of the .NET libraries. It cannot truly be fixed for legacy compat reasons, but please don't contribute to it.
My opinion exactly.

Defaulting to the CurrentCulture for strings that can't be localized anyway is just plain useless, especially if there is no way to specify the invariant culture instead.

Anyway, when I first heard about this string interpolation feature, I thought it was great, but the more I think about it, the less enthusiast I am. I think the issue of formatting strings with named placeholders would be better handled at the library level. A method similar to String.Format could accept named placeholders instead of numbered placeholders, and accept the arguments as an object:
// TODO: think of a better name
String.FormatWithNames("Hello, {name}, you have {amount} donut{{s}} left.", new { name, amount });
This approach is not as terse as string interpolation, but it has two compelling advantages:
  • the format string is localizable, so the feature isn't limited to non user-facing strings and non internationalized apps
  • the culture can be specified in a familiar way (just a IFormatProvider parameter, as with string.Format)
Jul 3, 2014 at 2:36 PM
@tom103 Mads mentioned in the design notes that they could add custom interpolaters in the future. So they should not do the feature currently because it doesn't solve 100% of your own particular use case? The alternative you propose incurs a runtime cost which involves reflection compared to a language feature which is compile time only.

There are a lot of strong positives for doing this feature with current culture:
  • The vast majority of software written is never internationalized.
  • It will solve 90% of use cases in its current form, the remaining 10% could be solved with a future language feature.
  • Almost every other modern language has or will have string interpolation. Regardless of arguments about the potential for bad practice or what not, clearly a lot of people find it very useful.
  • It's purely a compile-time sugar.
  • Current culture is consistent with the rest of .NET.
Jul 3, 2014 at 2:40 PM
tom103 wrote:
Defaulting to the CurrentCulture for strings that can't be localized anyway is just plain useless, especially if there is no way to specify the invariant culture instead.

Anyway, when I first heard about this string interpolation feature, I thought it was great, but the more I think about it, the less enthusiast I am. I think the issue of formatting strings with named placeholders would be better handled at the library level. A method similar to String.Format could accept named placeholders instead of numbered placeholders, and accept the arguments as an object:
// TODO: think of a better name
String.FormatWithNames("Hello, {name}, you have {amount} donut{{s}} left.", new { name, amount });
This approach is not as terse as string interpolation, but it has two compelling advantages:
  • the format string is localizable, so the feature isn't limited to non user-facing strings and non internationalized apps
  • the culture can be specified in a familiar way (just a IFormatProvider parameter, as with string.Format)
I agree with much of this. String interpolation mostly serves to solve problems that represent anti-patterns, at least in enterprise applications. But most applications are never going to be enterprise applications and concerns like internationalization simply don't exist. This is also a feature that is becoming not uncommon amongst programming languages and those languages do follow these conventions.

I think the issue with your suggestion is that, like string.Format today, it remains error-prone, at least without additional compiler support. Granted, with Roslyn I imagine that it would be relatively simple to scan all invocations of string.Format (or similar methods) and determine those scenarios where the formatting is likely to fail at runtime.

Also, I'm waiting to see if the C# team has further ideas around the concept of prefix operators for string interpolation. It was suggested that this could be used for localization purposes, although I'm not sure how.

Honestly, I'd love to see improved support for internationalization in Visual Studio. Something that expands upon the support of resource project items and can add typed parameterization so that rather than simply exposing properties of the raw resource strings that it can also expose methods accepting the parameters expected to be formatted into the resource strings.
Jul 3, 2014 at 3:51 PM
Edited Jul 3, 2014 at 3:51 PM
@tom103 Mads mentioned in the design notes that they could add custom interpolaters in the future.
"Could", not "will". It's completely possible that custom interpolators will never make it into the language.
So they should not do the feature currently because it doesn't solve 100% of your own particular use case?
My "own particular use case"? With the current design, this feature:
  • isn't suitable to generate strings that will be parsed by a machine (because it always uses the current culture, instead of the invariant culture)
  • isn't suitable for user facing strings (unless the app is never internationalized), since the string had to be hard-coded and can't extracted to resources.
    I'd say this makes a lot of use cases that are not covered...
The alternative you propose incurs a runtime cost which involves reflection compared to a language feature which is compile time only.
Indeed; however I suspect it could be optimized to be reasonably efficient. Reflection doesn't need to be done every time, you can maintain some kind of accessor cache to speed up the retrieval of the values. I've done it in my project NString; it's still 2 to 5 times slower than String.Format (depending on how you use it), but it's only my feeble attempt at it, I'm sure it could be optimized further.
There are a lot of strong positives for doing this feature with current culture:
  • The vast majority of software written is never internationalized.
I don't know the numbers, but any app for the general public usually needs to be internationalized, so that case is far from negligible.

Also, "not internationalized" doesn't mean "in English"... and if you try to generate strings for an interpreter using string interpolation in a non-English culture, you will get incorrect result because of different date or number formats. This scenario requires the use of the invariant culture.
  • It will solve 90% of use cases in its current form, the remaining 10% could be solved with a future language feature.
I don't know how you came up with that number, but I think the cases I mentioned above represent more than 10%...
What I know for sure is that with the current design, I will use string interpolation only for quick and dirty work, never for actual production code.
  • Almost every other modern language has or will have string interpolation. Regardless of arguments about the potential for bad practice or what not, clearly a lot of people find it very useful.
Just because other languages do it doesn't mean that C# should. I wouldn't like a feature that encourages bad practices to be added to C#...
Now, I'm not saying that string interpolation is necessarily a bad idea, but if it's included in the language, it should at least be done right.
Jul 3, 2014 at 4:39 PM
tom103 wrote:
  • The vast majority of software written is never internationalized.
I don't know the numbers, but any app for the general public usually needs to be internationalized, so that case is far from negligible.

Also, "not internationalized" doesn't mean "in English"... and if you try to generate strings for an interpreter using string interpolation in a non-English culture, you will get incorrect result because of different date or number formats. This scenario requires the use of the invariant culture.
The point being that the vast majority of software written is not public facing. They are internal applications written by the internal development teams of small businesses which will never be consumed by anyone outside of that business. Of course that is short-sighted and if that business grows into new markets in new countries they will potentially run into problems but that is the development landscape in general.

Also, using the invariant culture makes even less sense since the invariant culture is effectively a deterministic flavor of the English culture. If the application was being written in France for a French speaking business and it used string interpolation which implicitly relied on the invariant culture the formatting would be incorrect. It makes more sense to continue using the culture of the current thread ensuring that the application formats as expected for the locale in which it runs.

We've established that string interpolation represents anti-patterns but we're talking specific use cases in which the proper methods are not required or desired. And, as mentioned, it's clear that they're considering localization scenarios with a form of prefix syntax.
Jul 3, 2014 at 7:32 PM
Halo_Four wrote:
Also, using the invariant culture makes even less sense since the invariant culture is effectively a deterministic flavor of the English culture. If the application was being written in France for a French speaking business and it used string interpolation which implicitly relied on the invariant culture the formatting would be incorrect. It makes more sense to continue using the culture of the current thread ensuring that the application formats as expected for the locale in which it runs.
A lot of strings are generated for users, and a lot of strings are generated for parsing by machines. I would expect that to be just as true of strings generated with a "string interpolation" feature as for any other kind. Rather than arguing about whether the default should be to favor one usage over the other, there should be equally-convenient ways of requesting human-readable (culture-sensitive) and machine-readable (invariant-culture) strings. Making current culture substantially more convenient than invariant will lead to the creation of lots of code that will mangle data with many cultures other than "En-US", and making it substantially less convenient will cause inconvenience for anyone wanting non-US-style formatting. Personally, I think the former is a bigger danger than the latter, since someone formatting a string for a person that would expect 1234567/10 to be written as "123.456,7" would immediately notice if it were formatted "123456.7", but someone generating code for machine parsing in an environment that would format the value as 1234567 might not realize that in other cultures it would get parsed as a fraction 123456/1000 followed by the value 7.
Jul 4, 2014 at 12:56 AM
supercat wrote:
Halo_Four wrote:
Also, using the invariant culture makes even less sense since the invariant culture is effectively a deterministic flavor of the English culture. If the application was being written in France for a French speaking business and it used string interpolation which implicitly relied on the invariant culture the formatting would be incorrect. It makes more sense to continue using the culture of the current thread ensuring that the application formats as expected for the locale in which it runs.
A lot of strings are generated for users, and a lot of strings are generated for parsing by machines. I would expect that to be just as true of strings generated with a "string interpolation" feature as for any other kind. Rather than arguing about whether the default should be to favor one usage over the other, there should be equally-convenient ways of requesting human-readable (culture-sensitive) and machine-readable (invariant-culture) strings. Making current culture substantially more convenient than invariant will lead to the creation of lots of code that will mangle data with many cultures other than "En-US", and making it substantially less convenient will cause inconvenience for anyone wanting non-US-style formatting. Personally, I think the former is a bigger danger than the latter, since someone formatting a string for a person that would expect 1234567/10 to be written as "123.456,7" would immediately notice if it were formatted "123456.7", but someone generating code for machine parsing in an environment that would format the value as 1234567 might not realize that in other cultures it would get parsed as a fraction 123456/1000 followed by the value 7.
I have to disagree with you. I think that by targeting API integration by default that you end up opening a massive can of worms. The user stories for string interpolation in those cases generally involve SQL, HTML, JavaScript, etc. All of those scenarios require something much more intelligent and context-aware than whether or not the culture code happens to be invariant or not, such as proper escaping. I still bet that those scenarios represent a minority of the requests for this feature on UserVoice and the vast majority of users simply want to quickly and easily format human-readable text whether that be displayed in a user-interface, for exception messages or written to a text log.

As far as I can tell every other language that includes the capacity for string interpolation of non-string values formats those values under the current culture. That includes web technologies like PHP and Apple's general purpose Swift language. This is also the precedent set by every single composite string formatting method that exists in the .NET framework today. The has been well established, even if it still represents an anti-pattern. To diverge now would just be confusing to existing C# developers and any other developers learning C#.

That said, the whole notion of the prefix operators to provide context to interpolation does change the conversation a little bit. I would really like to see where the C# team has considered taking that concept. I kind of doubt that such a feature is slated for the initial release of C# 6.0 though, and even if it was it doesn't change the argument as to what the language should do by default. I side with the C# team, CultureInfo.CurrentUICulture.
Jul 4, 2014 at 3:37 PM
Halo_Four wrote:
...written to a text log.
Text logs are an example of a case where I would consider current culture very bad, since even log files which are intended for human viewing will often, as a consequence of such viewing, need to be machine-processed for purposes such as filtering or data aggregation. If most logs get written with one culture, but some get written with another, processing may get very messy. If log files with undetected "cultural differences" get merged, data may be lost (e.g. if a file contains a mix of data formatted DD/MM/YY and MM/DD/YY, there may be no way to determine whether a record was supposed to represent September 10 or October 9).

Perhaps culture could be made part of a "context" in a manner somewhat like checked/unchecked arithmetic, since it's perfectly plausible that code written in Europe may prefer to have all logs written with European date format, but that should be accomplished by specifying European format rather than hoping the current culture will be set that way.
Jul 4, 2014 at 3:45 PM
supercat wrote:
Halo_Four wrote:
...written to a text log.
Text logs are an example of a case where I would consider current culture very bad, since even log files which are intended for human viewing will often, as a consequence of such viewing, need to be machine-processed for purposes such as filtering or data aggregation. If most logs get written with one culture, but some get written with another, processing may get very messy. If log files with undetected "cultural differences" get merged, data may be lost (e.g. if a file contains a mix of data formatted DD/MM/YY and MM/DD/YY, there may be no way to determine whether a record was supposed to represent September 10 or October 9).
In which case you'd probably want to be using format specifiers anyway. If you're writing other logs using a shell script or other scripting language that supports string interpolation then you're already dealing with that data being formatted in the current culture.
Jul 5, 2014 at 6:28 AM
I think the custom interpolators prefix will be a massive aspect of the usefulness of C#'s Interpolation Feature. I feel that many of the potential problems outlined above would be addressed by a suitable prefix - e.g. UI$"...", LOG$"...", URI$"..." , API$"..." or JSON$"..." ...

Working on the assumption that the prefix feature is important enough to be implemented (and in a sensible way, that enables developers to define their own interpolators) all the default has to do is provide a simple default version. Do we prefer safeness (invariant) over simplicity (current culture)... while in my current work invariant would be nicer, I think that the choice has already been made for us with current culture being the default in similar circumstances (e.g. ToString)
Jul 5, 2014 at 7:00 PM
NPSF3000 wrote:
I think the custom interpolators prefix will be a massive aspect of the usefulness of C#'s Interpolation Feature. I feel that many of the potential problems outlined above would be addressed by a suitable prefix - e.g. UI$"...", LOG$"...", URI$"..." , API$"..." or JSON$"..." ...

Working on the assumption that the prefix feature is important enough to be implemented (and in a sensible way, that enables developers to define their own interpolators) all the default has to do is provide a simple default version. Do we prefer safeness (invariant) over simplicity (current culture)... while in my current work invariant would be nicer, I think that the choice has already been made for us with current culture being the default in similar circumstances (e.g. ToString)
Why "prefer safety over simplicity"? Why not offer equal-length ways of explicitly specifying both of the common cases? BTW, I think there either should have been two methods each for ToString(), Equals(), and HashCode(), or else they should have included a parameter to specify the usage case, since all three methods have two distinct usage cases with different semantics. Having the semantics of each object's implementation of those methods depend upon what it thinks its primary customer will want greatly reduces the usefulness of those methods in type-agnostic code.
Jul 7, 2014 at 10:37 AM
Edited Jul 7, 2014 at 11:00 AM
supercat wrote:
NPSF3000 wrote:
I think the custom interpolators prefix will be a massive aspect of the usefulness of C#'s Interpolation Feature. I feel that many of the potential problems outlined above would be addressed by a suitable prefix - e.g. UI$"...", LOG$"...", URI$"..." , API$"..." or JSON$"..." ...

Working on the assumption that the prefix feature is important enough to be implemented (and in a sensible way, that enables developers to define their own interpolators) all the default has to do is provide a simple default version. Do we prefer safeness (invariant) over simplicity (current culture)... while in my current work invariant would be nicer, I think that the choice has already been made for us with current culture being the default in similar circumstances (e.g. ToString)
Why "prefer safety over simplicity"? Why not offer equal-length ways of explicitly specifying both of the common cases? BTW, I think there either should have been two methods each for ToString(), Equals(), and HashCode(), or else they should have included a parameter to specify the usage case, since all three methods have two distinct usage cases with different semantics.
First, I wasn't 100% clear but I actually decided that simplicity was probably a better fit over safety.

I'll explain it this way, and address your question regarding length.

1) I assume that we'll have a good, custom prefix implementation.

2) Sensible defaults will be used per prefix (it's custom, you have control).

3) Now we have UI$"..." for UI needs that uses current culture and LOG$"..." for logging which uses the invariant culture. So on and so forth for every conceivable use case.

4) Now, what do we do when no custom prefix? Ala $"...". I see three options:

4.1) We force a custom prefix - so this is illegal. Seems a little heavy handed - just another bit of complexity in the way of simple code.

4.2) We have two symbols, $ for CurrentCulture and (say) # for invariant culture. This is great, except now we have two symbols with the complexities that causes - which means this is not really noob friendly. Furthermore it adds complexity when dealing with custom prefixes, should I do LOG$"..." or LOG#"..."??? What happens if I want to disable one type in the LOG interpolator, or change behaviour down the line? How do we deal with prefixes that don't fall under the invariant or current paradigm?

4.3) We keep $ and just default to current (simple) or invariant culture (safe). Given that the rest of the API defaults to simple, and the use case is for short quick code, I figure simple is best. What I ask myself is "If I was a Ukrainian Programmer, what would I expect $"..." to do?"
Jul 7, 2014 at 5:10 PM
4.1) We force a custom prefix - so this is illegal. Seems a little heavy handed - just another bit of complexity in the way of simple code.
How about allowing the same mechanism that would be used for other custom interpolators to define the meaning of the empty-named $, but give all of the default interpolators short--but non-null--names? Then the null-named $ would have whatever meaning the programmer found most useful.
Jul 7, 2014 at 5:31 PM
supercat wrote:
4.1) We force a custom prefix - so this is illegal. Seems a little heavy handed - just another bit of complexity in the way of simple code.
How about allowing the same mechanism that would be used for other custom interpolators to define the meaning of the empty-named $, but give all of the default interpolators short--but non-null--names? Then the null-named $ would have whatever meaning the programmer found most useful.
I don't think it would be a good idea, because it would make the code harder to read; you would need to know what default behavior the programmer has chosen for $.
Jul 8, 2014 at 10:49 AM
supercat wrote:
4.1) We force a custom prefix - so this is illegal. Seems a little heavy handed - just another bit of complexity in the way of simple code.
How about allowing the same mechanism that would be used for other custom interpolators to define the meaning of the empty-named $, but give all of the default interpolators short--but non-null--names? Then the null-named $ would have whatever meaning the programmer found most useful.
Sure, we can do this. There is a possible precedent in LINQ - where code can change behaviour depending on what provider is referenced.

Problem is all you're doing is moving the problem around, but not addressing it head on. Do you provide a single default implementation (e.g. System.Linq) - in which case all you've done is move the problem to the library writers. If you provide multiple implementations, what do you include as default? And do you change the default between project types? E.g. System.Interpolation.Current for WPF and Forms and System.Interpolation.Invariant for Console and DLL's? If so, how do you want to deal with issues where shared code changes functionality subtly between projects (if the code is copied) or within the project (if the shared code includes the namespaces)?

The advantage of having the default interpolator using current culture is that you introduce no new problems - the only issue already exists and programmers will already have to deal with it. If you think the existing system sucks, then use custom interpolators (which I suspect most developers will quickly migrate to) - which'll (hopefully) be easy and clear.
Jul 8, 2014 at 10:29 PM
NPSF3000 wrote:
Problem is all you're doing is moving the problem around, but not addressing it head on. Do you provide a single default implementation (e.g. System.Linq) - in which case all you've done is move the problem to the library writers. If you provide multiple implementations, what do you include as default? And do you change the default between project types? E.g. System.Interpolation.Current for WPF and Forms and System.Interpolation.Invariant for Console and DLL's? If so, how do you want to deal with issues where shared code changes functionality subtly between projects (if the code is copied) or within the project (if the shared code includes the namespaces)?
Elsewhere a feature has been suggested--and I heartily approve of it and alluded to it above--which would allow namespace aliasing to be applied in tightly-controlled scopes [e.g. at the method level]; methods which want to control which style of interpolation is used could easily specify it.

Otherwise, perhaps I'm in the minority on this, but if something may be done in slightly-different ways that will usually yield the same results, and neither way is obscure, I believe languages should try hard to avoid favoring one over the other; the more similar the two methods, the more important it is to avoid favoritism. If one reads const float OneTenth = 0.1f; and later float f2 = f1*OneTenth; double d2 = d1*OneTenth;, should one assume that the intention was to multiply both values by the fraction 13421773/134217728, that the intention was to multiply both values by 1/10, but the programmer thought 13421773/134217728 was probably "good enough", or that the programmer intended to multiply both numbers by 1/10 and the programmer didn't realize that 13421773/134217728 wasn't really good enough?

Incidentally, although ToString() seems to use the current culture, String.Format() does not seem to use it in the default rendering of numerical values. My expectation would be that, if anything, string interpolation arguments would behave like numbered arguments in String.Format, since that would be the closest analogue.
Jul 12, 2014 at 9:04 AM
Elsewhere a feature has been suggested--and I heartily approve of it and alluded to it above--which would allow namespace aliasing to be applied in tightly-controlled scopes [e.g. at the method level]; methods which want to control which style of interpolation is used could easily specify it.
So instead of simply spending a couple characters to clearly and unambiguously select the right interpolator, you'd rather suggesting adding redundant using statements on top of every method to select a default that may or may not be suitable... and still won't stop the problem of copy and paste? Instead of a nice simple system, you're proposing complex and non-intuitive 'fixes'.