This project is read-only.

Keyword to enforce method purity

Topics: C# Language Design
Apr 15, 2014 at 3:52 PM
Edited Apr 15, 2014 at 4:58 PM
I would like to see a keyword to enforce purity in methods. There is the [Pure] attribute already but it doesn't create a compiler error.
pure int Product(int i, int j) 
    i * j;
A pure method could not call any other method unless these are pure as well. Also, they couldn't use any outer variables. Purity plays very well with composition, parallelism and thread safety, so I think it would be a nice addition, specially thinking long-term. However this is mostly an idea since I am not very sure how this could be implemented.

Once this is enforced I guess the compiler and the runtime could do some optimizations more easily. Anyone one with a deeper understanding of of Roslyn, please discuss if this is not possible or just not important enough to deserve the effort.
Apr 15, 2014 at 9:51 PM
Edited Apr 15, 2014 at 9:52 PM
I really agree with this idea. Compiler enforced pureness could open up a lot of possibilities for optimizations and even just safety within the application.

The compiler would need to scan for anything that results in a method call, and determine if any of them aren't marked pure. Then it'd have to determine that it does not write to any non-local variables (fields, out/ref arguments etc).

Existing libraries do not use this keyword, so there's going to some problems unless the base libraries are updated (you'd have to make your own Math.Round function).

One thing that could increase adoption, at the expense of muddying the concept would be to allow it to call methods with the [Pure] attribute as well. These methods can't be compiler enforced without a breaking change, but existing libraries that use this attribute would let you do it.

It'd also let you do a little hack to get around the pure check. If you knew that some crypto library was pure, and wanted to use it in a pure method you could so something like this:
pure PasswordMatchesOldPasswords(string password, string salt, List<string> PasswordHistory)
      return PasswordHistory.Any(hash=> VerifyPassword(password,salt,hash));
bool VerifyPassword(string password, string salt, string passwordHash)
This would be very convient for adoption purposes, but it'd make the feature less useful in the future. It could be defined that using [Pure] and an impure method can cause undefined behaviour, so that it can still optimize it, but is this desired?

I've heard issues surrounding checking a pure chain, and it's a very common issue people have with haskell. If you want to just add a check print statement to check something (or maybe a log statement), suddenly your pure function becomes impure, and if things are calling it, then you potentially have a very big chain of functions you have to make impure just to do a little debugging, which is obviously not ideal. Having this hack with the [Pure] attribute would make it a lot easier to debug, but again you sacrifice some clarity (what if you forget to remove that log statement).

So I guess the issue comes down to, do you want a solution that gives you some of the safety, all the optimizations, and easy to make/use, or do you want all the safety and optimizations, but more difficult to integrate with existing systems?

We should also consider other places this could be useful.

Would a pure property be possible? (obviously it could only support get).

What about a pure class? Something that can read and write to private fields, but can't read/write to anything visible to other classes? This could potentially be useful (basically turn the class into a state machine that has no external dependencies).

What about lambdas? Should the compiler automatically determine if a lambda is pure? Should there be a way to explicitly mark it?

What about delegates? It'd be required to work with lambdas most likely.

What about type constraints? Or does have a pure interface that it implements good enough?

What about virtual methods? Can the compiler determine purity and enforce it? Should it?

This would be an awesome feature, but we need a lot of discussion around it.
Apr 16, 2014 at 4:03 PM
Some points I have been thinking about:
Then it'd have to determine that it does not write to any non-local variables.
I think that forbidding writing is not enough, a pure function shouldn't even read from non-local variables, because that could cause different results at different times of the computation. It could operate only on its parameters and local variables.

About lambdas, it would be nice if its purity could be assumed and inferred, since many lambdas are very simple and pure already. Of course it would be necessary that you were able to write the keyword just to let the compiler help you.
We should also consider other places this could be useful.
  1. Automatic memoization, with an attribute, for example [Memoization]. It would be, however, a dangerous attribute, since an unwise use of it could make you run out of memory very quickly.
  2. Automatic lazy evaluation. Evaluations could be easily chained, no need to do it right now… Again, there could be memory issues, like when you chain many computations in Haskell, for example in very large sums (when they are poorly-implemented).
  3. What I love the most about purity enforcement is safety and predictability, not optimizations. It would prevent a lot of side-effect bugs, thousands! Also, I think it's more or less pushing the way that C# started with LINQ.
But after all, C# has a lot of story behind, It would be bad to break some old project because of this idea, although I think it would be even worse (IMHO) implementing a not-all-good feature. Since pure methods can be called from regular methods, the biggest problem would be making sure that old methods are pure, which must be millions of code lines just for .NET libraries, for anyone that wanted to make use of 'pure' capabilities. That, and the chance that 'pure' is a very common name for variables, which I think is not...

This transition is already half-implemented for static classes, and in fact, static classes with no fields are already pure, which might be inferred by the compiler as well.
Apr 16, 2014 at 6:41 PM
a pure function shouldn't even read from non-local variable
It's an interesting thing to think about. In terms of CIL, a instance method takes this as an argument, so does reading from a field of this imply non-pureness? And if it can't access anything within the same class then it doesn't make sense to have a non-static pure method. If you say that they can't be instance methods then, then you'd run into a pretty big issue with anything that uses getters and setters, since you couldn't access that data from an object parameter.

You also run into some distinction between extension methods and instance methods. If extension methods can access the fields, then can instance methods? Transforming a instance method into an extension method does change how it's called, so it'd be odd if it changed purity. Perhaps it's enough to say they can't access private members of the class.

Then we have constants. I assume compile time constants don't affect purity (since constants could just be replaced at compile time), but what about readonly? It can only be set in the constructor, so within a particular instance it's constant, but it's not globally constant.

This does bring up an interesting question around lambdas too. Can you capture local variables in a lambda and still have it be pure. For instance:
var list = new int[]{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
var score = 10;
var list2 = list.Where(x=>x>score);
Would the lambda passed to where be considered pure? It can't be, because you can modify the local variables and alter the return value of the function, but it'd eliminate a lot of the usefulness of closures, and would just exist as a short hand for defining a new function, and not have any of the local capturing that makes them so great.

There is definitely a lot to be discussed here, and to consider what exactly pure means in an OOP language, as the narrow view of pure would mean an entire shift in coding style for any library that wants to allow people to use pure.

I am guessing it's probably ridiculous to think, but just how feasible would it be for the compiler to completely check purity? As in you can call a method not labelled as pure and it'd check if the method was pure. I'm guessing there'd be a lot of work, but could optimizations make it feasible?
Apr 24, 2014 at 3:45 PM
Edited Apr 24, 2014 at 3:50 PM
I have been thinking about your points, mirhagk, and I have reached some conclusions (at least for the moment). Pure methods are defined by being commutative with respect to the order in which they are invoked. The result of calling method A and then B, should be equivalent to calling B and then A. This implies that this methods don't create side effects and that they always have the same result. This definition might not be strict enough, I am still trying to find some reference about this topic (or figuring out a definition in this discussion). However, it is worth making a difference between methods that cause side effects and methods that consume side effects.

Methods that cause side effects are completely out of question, they will never be pure. On the other hand, methods that consume side effects are somehow special.

This is because side-effects-consumer methods commute with pure methods. For example, take a method that reads a file. It's consuming a side effect, because it reads from a global state. However, if you don't write anything to that file, the method can be considered somewhat pure, because the result will be always the same, no matter how many times you it. That will hold until you use any method that could write, delete, move or do anything with that file.

This brings a problem at the time of implementation: the compiler should know which methods consume side effects, and the runtime should check wether a consumed side effect is changed or not, in order to parallelize/delay/lazy-evaluate a method call. This seems an overwhelming project, at least to me. Other option is to not use methods that consume side effects, which makes the feature simpler while remaining meaningful (although less powerful).

Therefore I think that const fields could be properly used in a pure method: several invokes would commute. I also think it would be possible the same with readonly, although that made think a little more. Every instance you have could be different, so you could say that method is having different results. However, if you consider the tuple method-instance, you actually have a unique pure method. It's as if creating an instance would create a different method too. Those methods commute with each other and also with themselves. Good news for us, functional people, this is readonly.

We could extend the postulate of the definition to make it more consistent: a pure method commutes with any other method.

Regarding to lambdas, I guess that they would be pure as long as you capture only immutable fields. If you captured a mutable field, you would be consuming a side-effect.
There is definitely a lot to be discussed here, and to consider what exactly pure means in an OOP language
You made me think a lot there! Maybe it doesn't even make sense to have this discussion in an OOP language. What happens if you make a million calls from someInstance and then do someInstance = null? What happens with the calls if they were parallelized already but not evaluated yet? Maybe that is a problem belonging to the implementation, not to the concept, but it made me fear that maybe the garbage collector would need to treat instances differently if they had pure methods.
Apr 24, 2014 at 4:18 PM
However, if you consider the tuple method-instance, you actually have a unique pure method.
This is the tricky part for me. Knowing the assembly it produces (even just the ILASM), this is simply the first argument to the function. If our definition is that it takes in parameters (does not modify them) and outputs a result, and relies on nothing else for input, and produces nothing else as output, that still means that reading this would be allowed.

It is kinda counter-intuitive to think that something that reads a field from the class could be considered pure, but as long as it's an instance method we can say it's simply a parameter.

If we disallow this, then what about extension methods? There the field is definitely a parameter, so reading from that should seem perfectly fine, but if we allowed extension methods but not instance methods to read from instance fields, then we'd have a weird case where you have things that look like instance methods allowed to read local state, but actual instance methods can't.

I personally think that instance methods should be allowed to read instance fields, as they are simply a parameter passed to the function. If you call the method with the same arguments it will always return the same results, it's just that you can't consider this to always be the same argument. But this is true of any parameter. For instance:
var a = new Point(){X = 2, Y = 3};
//some other work
The compiler still has to ensure that the 2 locations of a are indeed the same object, and furthermore that the fields have not been modified in between. This second point is much harder to prove, and perhaps even impossible in some cases. If the field is visible to multiple threads you'd have to ensure it is never modified ever, not just in between. Perhaps we could just say that the variable can never be visible to any other non-pure methods in order for the compiler to optimize that CalculateLength function, but that does limit the optimizations (although still allows the correctness check associated with pure methods).

Another option would be to say that anything passed to a pure method must be immutable, but that carries it's own set of tricky checks. We need to ensure not only that the object is immutable, but any fields that the object refers to are also immutable. We'd need to decide on an entire set of rules for what objects we can consider immutable, and whatever set of rules we find, there's going to be some objects that are immutable that the compiler misses. Besides this severely restricts the usefulness of the pure function as most of the existing libraries do not have immutable types.

I would absolutely love all these problems to be magically solved, but I think any solution would either be impure, or require vastly changing every existing library, and at that point why don't we just create a new language?

I believe a similar issue was brought up with non-nullable reference types, and eric lippert addressed it as would love to have, but just not feasible.

In the blog post he mentions Spec#, perhaps this is a feature for Spec#? Perhaps a new, C# like language that can guarantee a lot of safety can be created?
Apr 24, 2014 at 11:31 PM
I would think that at the moment such a feature belongs more to a static code analysis tool (like the static contracts checker tool or fxcop) rather than being built into the C# compiler.

That doesn't mean I don't like the actual idea, quite the opposite.