This project is read-only.
6
Vote

Block strings

description

I've often heard people suggest adding core C# language support for XML, JSON, Regexes, or some other DSL. In my experience, the problem really is that it is such a pain to try and write string literals because even the "verbatim" string has to have a common character escaped, ".

My suggestion is for a true block string to be added to the language, so you can copy and paste code, DSLs, whatever you need into a string and not have to worry about escaping it (not to mention trying to maintain the escaped string after the fact). It would work like this:
var a = """Write whatever you want in here! "Look Ma! No escaping!" I can use quotes and \r\n all in teh same place!""";
var b = """
<outerNode foo="true">
  <innerNode bar="false"/>
</outerNode>
""";
It would have zero escape characters, so what you see is always what you get. This makes your string a lot easier to read and maintain than is currently possible with either verbatim or regular strings.

comments

JanKucera wrote Apr 10, 2014 at 6:18 PM

You still need to escape three quotes, don't you?

MgSam wrote Apr 10, 2014 at 6:36 PM

I don't think so. I think 3 quotes is rare enough that if you have it you just have to use one of the other string formats.

JanKucera wrote Apr 10, 2014 at 7:03 PM

Might be rare enough now.
My suggestion is for a true block string to be added to the language, so you can copy and paste code,
Now you aren't able to copy and paste code and not to worry if it contains the true block string...

MgSam wrote Apr 10, 2014 at 7:23 PM

I have never seen three double quotes in a row used in any legitimate context outside of a programming language. I don't remember offhand what other languages with block quotes of this form do. That would be a good starting point to look at.

JanKucera wrote Apr 11, 2014 at 10:01 AM

My point was that you are just introducing such a language, and that prevents you using the feature on your own language in the first place. Also then someone can come and say, I want my language to allow simply copy & paste from your code and will introduce """" or whatever.

By the way, C# is the one that can use several double quotation marks in row pretty easily - in the verbatim string!
string xml = @"<element emptyattribute="""" />";
string quote=@"""Welcome!""";

Moreover, Visual Basic does not have verbatim strings as far as I know and such escaping is present in standard strings in Visual Basic... you even declare the single quote char type as """"c in Visual Basic, see http://blogs.msdn.com/b/vbteam/archive/2014/04/03/taking-a-tour-of-roslyn.aspx, Example of updating the compiler, (4).

mdanes wrote Apr 11, 2014 at 10:43 AM

This can be solved by allowing arbitrary string delimiters like C++11 does:
char *p = R"***(fo
"o)***";
This is equivalent to fo\n\"o. The *** are just an example, you can use any character sequence you want or none at all.

JanKucera wrote Apr 11, 2014 at 6:24 PM

Agree, that would solve that issue in pretty usable way.

madst wrote Apr 11, 2014 at 11:15 PM

I do see the point of this feature. On the other hand, it feels to me like a problem that may not be worth having yet another form of string literal for.

Maybe this is more a tooling issue - maybe you could have a "paste as string literal" function?

mdanes wrote Apr 12, 2014 at 10:43 AM

maybe you could have a "paste as string literal" function?
Where? In Notepad? :)
it feels to me like a problem that may not be worth having yet another form of string literal for
I'd say it's worth more than binary literals. I rarely felt the need for binary literals but I often run into strings containing " and in that case verbatim literals aren't really useful, you just trade the \" for a "".

MgSam wrote Apr 14, 2014 at 5:55 PM

@madst Thanks for your feedback. I agree that tooling can help, but it still is a major pain to maintain strings that have escape sequences in them.

I think the best example of this are regular expressions. Say I have a regex .*?\\"(.*?)"\\.* It is supposed to match the contents of double quotes between two backslashes. I already have to escape the backslashes for the regex engine. If I make this a normal string in C#, it looks like ".*?\\\\\"(.*?)\"\\\\.*". This is such a mess of escaping that it is already nearly impossible to maintain. If I make this a verbatim string in C#, it looks like @".*?\\""(.*?)""\\.*" Better, but you still must be very careful to have the correct number of escaping quotes. Whereas, with a block string, what you see is what you get: """.*?\\"(.*?)"\\.*"""

As another example, I have some unit tests where I manipulate some XML-based data structures. I verify that they are in the correct form by putting the expected XML in strings that get compared to a ToString() of the data structures at runtime. Because of the annoyance of verbatim string escaping, I'm usually writing the expected output in an XML editor and then copying and pasting it into the code using an extension which automatically escapes the double quotes. This works, but it's an annoying extra step and also means that I need to unescape the strings again if I ever want to copy them out of the unit test to put somewhere else.