Convenient floating-point math

Topics: C# Language Design, VB Language Design
May 5, 2014 at 2:31 AM
With the present floating-point rules, it is often necessary for code to concern itself with differences between float/Single and double/Double types even in cases where such distinction is counterproductive. If some code is intended to pass the most accurate representation of a value that a method can accept, there should be a concise way for code to do that. Given code like:
surface.LineTo((float)X, (float)Y); // C#
surface.LineTo(csng(X), csng(Y)); // VB.NET
the typecasts or conversions not only clutter up the code, but they also make it unclear whether the programmer actually wants to round the values to single precision, and would want to do so even if the methods could accept Double, or is including the typecasts merely because the compiler demands them.

Being instruct the compiler that X and Y should perform double-precision calculations, but should down-convert to single-precision without complaint would allow the above statements to be rewritten as:
surface.LineTo(X,Y); // Either language
Much cleaner, and less likely to cause erroneous rounding.

My proposal would be for VB.NET to add two new compile-time aliases each for System.Single and System.Double. When used as method parameters or return values they would be marked with attributes, but otherwise the Runtime would view them as Single and Double. The difference would be in what the compiler accepts, and in what code the compiler would generate.

The new types would be:
  • strict float or Strict Single : "I want exact IEEE 32-bit semantics." This type would behave like float, except that it would not implicitly convert to, or accept implicit conversions from, any type except other aliases of System.Single, and use of this type would force the runtime to round all intermediate results to 32-bit precision regardless of other floating-point mode-options.
  • strict double or Strict Double "I want exact IEEE 64-bit semantics." This type would behave like double, except that it would not accept implicit conversions from any type except other aliases of System.Double, and would force the runtime to round all intermediate results to 64-bit precision regardless of any other floating-point-mode options.
  • short real or Short Real : "I'm only willing to spend 4 bytes of storage per value, but would otherwise use Double" This type would generally stored as a Single, but most operations would eagerly promote it to the longest convenient precision (typically 64-bit unless 80-bit can be handled just as well). The Short Real type would accept implicit conversions from other 32-bit types, Double, or Long Real; it would implicitly convert to Long Real but not Double.
  • long real or long Real : "I'd like to compute things precisely, and pass this to code in the most accurate format it will accept." This type would behave mostly like Double, but it would accept conversions from Short Real and would be convertible to Single.
Note that in most cases where implicit conversions could create ambiguity, it would be right and proper for the compiler to demand a typecast to clarify the programmer's intention. An expression like someShortReal == someLongReal should be considered ambiguous, since there are situations where programmer would want to determine whether someShortReal holds the best possible 32-bit representation for a 64-bit value, and other situations where whether someLongReal is an exact representation for the value of someShortReal. Compatibility requires that someFloat == someDouble remain legal, but that doesn't mean new types must allow such usage.

If adding new compile-time types would be overly difficult, allowing different floating-point modes to be specified within code would probably be almost as good. The main requirement is to avoid the need for ugly hard-to-read bug-inducing typecasts.
May 5, 2014 at 4:02 AM
Actually I think not requiring the type-casts would be more error prone. Such a conversion is narrowing and the convention between both languages in that case has been to require that the programmer explicitly declare their intention. Conversion from double to float will definitely cause data truncation which will have ramifications on any calculations. You won't see runtime errors, but you could end up with float.Infinity or float.NegativeInfinity if the double is out of range. VB.NET does allow the explicit type-cast requirement if you set Option Strict Off, which is something that C# should never get.

The real complaint should be not having overloads of all of these APIs that accept double as well as float.
May 5, 2014 at 6:21 AM
I am quite familiar with the rationale which Gosling used in the design of Java, and which Microsoft followed in designing VB.NET and C#. I would suggest that it would make sense in cases where floating-point values are used to represent precise quantities, but such cases represent an extreme minority of places where floating-point math is actually used. The vast majority of the time when a programmer writes float x=1000000.1f;, the programmer doesn't want to set x to the precise value 1000000.125, Rather, the programmer wants to set x to the precise value 1000000.100, but is willing to settle for something that's within about 0.1ppm. More generally, I would suggest that in the vast majority of places where floating-point numbers are used, a value like the aforementioned 1000000.125 isn't intended to represent an exact quantity, but rather the result of a numerical calculation for which that is a good approximation.

Fundamentally, I would consider the deciding factor for when a compiler should accept implicitly a cast that can never throw an exception should be "How likely is it that the programmer might have intended some other meaning?" I would suggest that while there are a few cases where an implicit Double to Single conversion might have unintended effects, they are vastly outnumbered by cases where precision is lost by code that uses or casts things to float purely to mollify the compiler.

As for your comment that API methods which don't even need a full 24 bits of precision should have overloads for double as well as float, what advantage would that have over tagging parameters with an attribute saying that they should accept either a float or an implicitly-cast double?

I would suggest that in cases where floating-point variables are intended to represent exact values, a programmer should be allowed declare them using strict types, thus ensuring that a typecast would be required when e.g. assigning an int to a strict 32-bit float, or a long to a strict 64-bit float (since such conversions may cause rounding). In the far more common cases where floating-point variables are used to represent approximate values, a programmer should be allowed to declare them as approximate types, and have that declaration be adequate notice to the compiler "Yes, I am aware that these values might get rounded off".
May 5, 2014 at 11:25 AM
To which I counter that the vast majority of programmer uses for int is to store a value lower than 32,767, but I far from think that should mean that the compiler should permit implicit conversions to short. And in both cases the programmer is fully capable of declaring what they want to do.
May 5, 2014 at 1:51 PM
How often, when int is converted to short, is the int expected to be precisely representable in short? I would suggest the vast majority of the time.

How often when double is converted to float, is the double expected to be precisely representable in float. I would suggest very rarely, and in those cases where it was particularly expected, it would be most helpful to have the conversion fail if it wasn't.

I have an object Foo with a method Bar that takes a single floating-point property argument. Given a variable X of type double, how should call Bar with the most precise possible value X? You say programmers are perfectly capable, so how about it? Is the correct call Foo.Bar(X); or Foo.Bar((float)X;?

I would suggest that if there's no way of declaring X to say "I'm using double here to retain precision when possible, but fall back to float when needed", or "This method doesn't need more than 24 bits of precision, but should be invokable with double", there should at minimum be a notation which says to "down-convert only as needed".

Otherwise, I would posit that the majority of float casts are not declaring what the programmer wants to do, but rather something that a programmer is willing to have happen in order to allow what they want (passing a double to a method which doesn't ask for more precision than it's going to use). I would suggest that there are at present two things a programmer could really be wanting when using a float cast:
  • "I want this value to be rounded to float precision, even if an overload (or for that matter the only overload) accepts double
  • I want to call this method using the most precise available overload, but I don't believe it takes double.
At present, there is no means of expressing the latter meaning. Consequently, if someone sees:
and Foo.Bar in fact does have an overload that takes Double, it's unclear what the programmer wanted. If the programmer could simply have written:;
regardless of whether the method took float or double, then the meaning would have been clear whetherfooacceptsfloat,double`, or both.