Program proofs

by Carl Burch, Hendrix College, September 2012

Creative Commons License
Program proofs by Carl Burch is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Based on a work at www.toves.org/books/axiom/.

Contents

1. Straight-line programs
1.1. Basics
1.2. An example
1.3. Proving backwards
2. Control statements
2.1. Proving a program to compute factorials
2.2. Another example
3. Shortcomings
3.1. Assumptions about expressions
3.2. Partial and total correctness
3.3. Complexity
4. Additional problems

Being able to prove programs correct is a desirable element of a programming language, since this leads to more reliable programs. While the proof process is very tedious and is impractical for most situations, knowing how to prove a program correct can help a programmer reason about a program's correctness. For example, when I write an algorithm, I often informally think about how it might be proven, and sometimes this leads to insights about important, forgotten cases.

In this chapter we examine a system for proving programs in an imperative language. We use a proof system designed for a severely restricted version of C, but the system can easily be extended to a more complete language.

1. Straight-line programs

Straight-line programs are programs that include no control statements. We'll begin with this type of program to gain an understanding of how the proof system works; we'll incorporate control statements into our proof system later. For this restricted version, in fact, we'll consider only assignment statements.

1.1. Basics

A hypothesis involves a combination of program code with assertions that say what is true at particular points in the program's execution. Assertions are enclosed in braces and are written in mathematical notation. Consider the following hypothesis, for example.

fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }

This hypothesis consists of three parts: an assertion called the precondition, followed by some C code, followed by another assertion called the postcondition. It indicates that, were the precondition to hold before executing the given C code, then the postcondition would hold upon completing the code.

We will prove such a hypothesis using a sequence of formal steps. Each step will be justified either by being a straightforward mathematical fact (involving no assertions) or by applying one of a set of rules, or axioms. We will begin with the following five axioms useful for proving straight-line programs.

Rule of Sequence     Rule of Consequence     Rule of Assignment
{PS {Q}
{QT {R}
{PS T {R}
   
P ⇒ Q
{P} {Q}
   
{Px → Ex = E; {P}
Rule of Preconditions     Rule of Postconditions
P ⇒ Q
{QS {R}
{PS {R}
   
{PS {Q}
Q ⇒ R
{PS {R}

Axioms are written in a form involving some set of facts above a line, followed by a single fact below the line. This indicates that if we can prove every fact above the line, then we can infer the fact below the line using the axiom.

For example, the Rule of Sequence requires first that we show two things. The first, {PS {Q}, directs us to show that if P is true, then after executing S, Q is true. We also need to show that if Q is true, and then after executing T, R is true. If we can show both of these, then we can conclude that if P is true, and then we execute S then T, then R is true. (If this sounds obvious, that's exactly the point: Axioms are supposed to be self-evident, here just as in geometry.) This rule gives us a way of combining statements into a sequence.

Then we have the Rule of Consequence, which requires a previous step of the form P ⇒ Q. This is a mathematical assertion with no reliance on programming at all; consequently, the justification that would allow us to include such a step would be a mathematical argument.

In the Rule of Assignment, the notation Px → E indicates the proposition where we take P and replace every occurrence of x by the expression E.) This axiom says that, if Px → E is true, then after we assign E to x, P is true. We don't need any previous steps to conclude this, and so nothing appears above the line in the axiom statement.

This rule is somewhat peculiar in that conceptually it proceeds backward in time: You take a proposition P that you want to be true after the assignment statement, and from this you can derive a proposition (Px → E) that, if true before the assignment takes place, would make P be true after the assignment.

Finally, we have the Rule of Postconditions and Rule of Preconditions. Technically, these are unnecessary, as each is simply a combination of the Rule of Sequence and the Rule of Consequence.

1.2. An example

Below, we use our five axioms to demonstrate a complete formal proof building up to the conclusion in the final step. This proof is a sequence of steps numbered in increasing order. Each step consists of a fact, followed by a justification of the fact. When the axiom used requires previous steps, they're referenced in the proof by listing the lines where those steps can be found.

1. fact = ((i + 1) − 1)! ∧ i ≥ 1 ⇒ fact = i! ∧ i + 1 ≥ 1 Mathematical fact
2.fact = ((i + 1) − 1)! ∧ i + 1 ≥ 1 }
i = i + 1;
fact = (i − 1)! ∧ i ≥ 1 }
Assignment
3.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = (i − 1)! ∧ i ≥ 1 }
Preconditions: 1, 2
4. fact = (i − 1)! ∧ i ≥ 1 ⇒ fact ⋅ i = i! ∧ i ≥ 1 Mathematical fact
5.fact ⋅ i = i! ∧ i ≥ 1 }
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Assignment
6.fact = (i − 1)! ∧ i ≥ 1 }
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Preconditions: 4, 5
7.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Sequence: 3, 6

For example, the justification for step 3 states that the fact is proven using the Rule of Preconditions, based on steps 1 and 2. Recall the Rule of Preconditions.

P ⇒ Q
{QS {R}
{PS {R}

The Rule of Preconditions states that we need a statement of the implication “P ⇒ Q”, which we have in step 1, and a statement of the form “{QS {R}” with the precondition matching the implication's consequence, which we have in step 2. So we can manipulate it into “{PS {R}” in step 3, matching “{QS {R}” except that the precondition has changed.

Step 2 involves the Rule of Assignment.

{Px → Ex = E; {P}

The Rule of Assignment requires nothing to be proven, and so the justification of step 2 refers to no other lines of the proof. However, you can see that step 2 follows the required form “{Px → Ex = E; {P}”: Every instance of i in its postcondition is replaced by i + 1 in the precondition.

Step 7 illustrates the Rule of Sequence at work.

{PS {Q}
{QT {R}
{PS T {R}

In this case, the step references steps 3 and 6. Since the postcondition of step 3 matches the precondition of step 6, this assertion can be validly understood as the Q of the Rule of Sequence. When we write the conclusion given in step 7, the precondition of step 3 stands in for P and the postcondition of step 6 stands in for R.

Steps 1 and 4 are justified as “mathematical facts.” This is a cheat: Technically, they too should be justified by a sequence of logical inferences. In the case of step 1 of our proof, we can easily observe that if we have the antecedent, fact = ((i + 1) − 1)! implies fact = i! as a simple matter of algebra, and of course if i is at least 1, then i + 1 is also, since it is even larger. In the case of step 4, we can multiply both sides of fact = (i − 1)! by i. Doing this, however, is tedious, and it lies beyond the scope of proving programs. Thus, we'll settle for simply asserting that they're easily proven. (Students sometimes take this informality as a license to claim some things as mathematical facts that are actually false. No: It's incumbent on the proof writer to verify that such facts are indeed easily proven.)

1.3. Proving backwards

That one can write a proof is nice, but ultimately we want to know how to construct a proof. Some guidelines about how to construct such a proof are in order.

To construct a program proof, it's most convenient to go backwards: We'll start with our desired conclusion and eventually work ourselves back to the beginning. Suppose we're trying to prove the following as the conclusion of our proof.

100.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }
???

We've numbered this as step 100 in our proof. Of course, we won't be using all of steps 1 to 99, but we don't know how many there will be yet.

We reason that the code of step 100 is a combination of two smaller pieces of code; we'll have to use the Rule of Sequence to combine them. Thus, we assume that the Rule of Sequence must be the rule implying step 100. To apply this rule to conclude with step 100, we need to have proven two facts previously.

50.fact = i! ∧ i ≥ 1 }
i = i + 1;
A }
???
99.A }
fact = fact * i;
fact = i! ∧ i ≥ 1 }
???

We don't know exactly what A is yet, and we don't know how to prove these two steps. But we do know that if we can prove steps 50 and 99, then we could apply the Rule of Sequence using steps 50 and 99 to derive step 100.

We'll work on proving step 99 first. Step 99 looks very much like something proven by the Rule of Assignment. We can determine the nature of assertion A based on the Rule of Assignment, by substituting fact ⋅ i for fact in the desired conclusion. Thus, step 99 becomes the following.

99.fact ⋅ i = i! ∧ i ≥ 1 }
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Assignment

The assertion A thus derived is called the the weakest precondition. The name comes from its being the minimum that must be true to get the desired conclusion.

With step 99 out of the way, only step 50 remains to be proven. With A determined, we know exactly what we have left to prove.

50.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact ⋅ i = i! ∧ i ≥ 1 }
???

This statement doesn't match any of our rules exactly. However, it does have an assignment statement, which indicates that we'll have to use the Rule of Assignment. The Rule of Assignment doesn't apply for step 50, but we can go ahead and throw it into our proof using the same postcondition required for 50.

49.fact ⋅ (i + 1) = (i + 1)! ∧ i + 1 ≥ 1 }
i = i + 1;
fact ⋅ i = i! ∧ i ≥ 1 }
Assignment

This doesn't match step 50 exactly, since the preconditions don't match. But the rest does match, which suggests that the Rule of Preconditions could come into play. To use the Rule of Preconditions, we'd also need the following step.

48. fact = i! ∧ i ≥ 1 ⇒ fact ⋅ (i + 1) = (i + 1)! ∧ i + 1 ≥ 1 ???

We can quickly examine this to verify that it is a mathematical fact: If we know

fact = i!,

then we can multiply both sides by i + 1 to get

fact ⋅ (i + 1) = i! ⋅ (i + 1) = (i + 1)!.

With this demonstrated, our overall proof has all its steps completed. Putting the parts together, the final product is below.

48. fact = i! ∧ i ≥ 1 ⇒ fact ⋅ (i + 1) = (i + 1)! ∧ i + 1 ≥ 1 Mathematical fact
49.fact ⋅ (i + 1) = (i + 1)! ∧ i + 1 ≥ 1 }
i = i + 1;
fact ⋅ i = i! ∧ i ≥ 1 }
Assignment
50.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact ⋅ i = i! ∧ i ≥ 1 }
Preconditions: 48, 49
99.fact ⋅ i = i! ∧ i ≥ 1 }
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Assignment
100.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Sequence: 50, 99

This proof is actually shorter than our earlier one: The backward-derived proof has only five steps, while the other had six. More important than its shortness, however, was our systematic construction of it. You could even conceive of writing a program to generate a proof like this automatically.

2. Control statements

Things get somewhat more difficult when we throw control statements like if and while statements into the batch. We'll add two new axioms to give us the ability to prove programs that incorporate if and while statements. These are the last two axioms we'll see. (In case you've forgotten your logical notation: The symbol ∧ stands for logical AND and the symbol ¬ stands for logical NOT.)

Rule of Condition Rule of Iteration
P ∧ B } S { Q }
P ∧ ¬ B } T { Q }
P } if (B) S else T {Q}
I ∧ B } S { I }
I } while (B) S { I ∧ ¬ B }

The Rule of Condition states that if we want to prove that P implies that executing an if statement makes Q hold, then we must have two previous steps: { P ∧ B } S { Q } and { P ∧ ¬ B } T { Q }. If we have know these things, and we reach the if statement with P holding, then B is either true or false. If B is true, then both P and B hold and we will execute S; because we have proven “{ P ∧ B } S { Q },” we know that Q must hold after executing S. If B is false, then P and ¬ B hold before executing T; because we have proven “{ P ∧ ¬ B } T { Q },” we know that Q must hold after executing S. Regardless of whether B is true, then, Q holds upon completing the if statement.

The Rule of Iteration states that, if we know that the body of a while loop preserves the truth of some assertion I (assuming that the while condition B also holds before executing the body), then we can conclude that if I holds on reaching a while statement, it will continue to hold after each iteration, and thus it will hold once B is finally false and hence the loop completes.

The assertion I is called an invariant of the while loop, since it invariably holds as the while loop completes each iteration.

2.1. Proving a program to compute factorials

Now suppose we want to prove the following.

n ≥ 1 }
fact = 1;
i = 1;
while (i != n) {
    i = i + 1;
    fact = fact * i;
}

fact = n! }

This hypothesis boils down to proving that this C code correctly computes n!: If n is at least 1 before we start executing the code, then fact holds n! following it.

To prove that the code works, we'll have to decide on the invariant for the loop. This invariant I needs to have the property it holds on first reaching the loop, it holds after each execution of the loop, and I ∧ ¬ B implies the desired conclusion after the loop.

In general, coming up with such an invariant involves a bit of insight; practice definitely helps recognize the invariant. In this case, we'll use the following invariant for I.

fact = i! ∧ i ≥ 1

This is typical of invariants for loops that iterate over a sequence of integers: There is a reference to the loop control variable i within the invariant, and the value of i on first reaching the loop makes for a trivial assertion, while the value of i on completing the loop gives us what we want. (We'll see an example of a similar invariant in the next section.)

Below is a completed proof using this invariant.

1.fact = i! ∧ i ≥ 1 }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }
from previous section
2. fact = i! ∧ i ≥ 1 ∧ i ≠ n ⇒ fact = i! ∧ i ≥ 1 mathematical fact
3.fact = i! ∧ i ≥ 1 ∧ i ≠ n }
i = i + 1;
fact = fact * i;
fact = i! ∧ i ≥ 1 }
Preconditions: 1, 2
4.fact = i! ∧ i ≥ 1 }
while (i != n) {
    i = i + 1;
    fact = fact * i;
}

fact = i! ∧ i ≥ 1 ∧ ¬ (i ≠ n) }
Iteration: 3
5. fact = i! ∧ i ≥ 1 ∧ ¬ (i ≠ n) ⇒ fact = n! Mathematical fact
6.fact = i! ∧ i ≥ 1 }
while (i != n) {
    i = i + 1;
    fact = fact * i;
}

fact = n! }
Postconditions: 4, 5
7.fact = 1! ∧ 1 ≥ 1 }
i = 1;
fact = i! ∧ i ≥ 1 }
Assignment
8. { 1 = 1! ∧ 1 ≥ 1 }
fact = 1;
fact = 1! ∧ 1 ≥ 1 }
Assignment
9. { 1 = 1! ∧ 1 ≥ 1 }
fact = 1;
i = 1;
fact = i! ∧ i ≥ 1 }
Sequence: 8, 7
10. n ≥ 1 ⇒ 1 = 1! ∧ 1 ≥ 1 Mathematical fact
11.n ≥ 1 }
fact = 1;
i = 1;
fact = i! ∧ i ≥ 1 }
Preconditions: 10, 9
12.n ≥ 1 }
fact = 1;
i = 1;
while (i != n) {
    i = i + 1;
    fact = fact * i;
}

fact = n! }
Sequence: 11, 4

The proof begins by referencing our earlier proof and continues in three basic parts:

2.2. Another example

To get a better feel for how this works, let's examine another program. This one finds finds the sum of all the factors of n (excepting n itself).

n ≥ 1 }
i = 1;
sum = 0;
while (i != n) {
    if (n % i == 0) {
        sum = sum + i;
    } else {
    }
    i = i + 1;
}

sum = ∑j = 1n − 1 j ⋅ δdiv(nj) }

The postcondition uses the notation δdiv(nj); this specialized mathematical notation expresses a function that is 1 when j divides into n exactly and 0 otherwise.

δdiv(nj) = {
1if n mod j = 0
0otherwise

The postcondition, then, says that sum is the sum of the factors n, up to but not including n itself.

The first thing we need to do is to determine our invariant. For this loop, a good invariant is the following.

sum = ∑j = 1i − 1 j ⋅ δdiv(nj)

Note the similarity to the invariant for the factorial program:

fact = i!.

In both cases, each additional iteration adds a new term to the summation/product.

1. n ≥ 1 ⇒ 0 = ∑j = 11 − 1 j ⋅ δdiv(nj) Mathematical fact
2. { 0 = ∑j = 11 − 1 j ⋅ δdiv(nj) }
i = 1;
{ 0 = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Assignment
3.n ≥ 1 }
i = 1;
{ 0 = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Preconditions: 1, 2
4. { 0 = ∑j = 1i − 1 j ⋅ δdiv(nj) }
sum = 0;
sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Assignment
5.n ≥ 1 }
i = 1;
sum = 0;
sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Sequence: 3, 4
6. sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n ∧ n mod i = 0
   ⇒ sum + i = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj)
Mathematical fact
7.sum + i = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
sum = sum + i;
sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
Assignment
8.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n ∧ n mod i = 0
sum = sum + i;
sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
Preconditions: 6, 7
9. sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n ∧ ¬ (n mod i = 0)
   ⇒ sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj)
Mathematical fact
10.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n ∧ ¬ (n mod i = 0) }
sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
Consequence: 9
11.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n }
if (n % i == 0) {
    sum = sum + i;
else {
}

sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
Condition: 8, 10
12.sum = ∑j = 1(i + 1) − 1 j ⋅ δdiv(nj) }
i = i + 1;
sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Assignment
13.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ i ≠ n }
if (n % i == 0) {
    sum = sum + i;
else {
}
i = i + 1;

sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
Sequence: 11, 12
14.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
while (i != n) {
    if (n % i == 0) {
        sum = sum + i;
    } else {
    }
    i = i + 1;
}

sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ ¬ (i ≠ n) }
Iteration: 13
15. sum = ∑j = 1i − 1 j ⋅ δdiv(nj) ∧ ¬ (i ≠ n)
   ⇒ sum = ∑j = 1n−1 j ⋅ δdiv(nj)
Mathematical fact
16.sum = ∑j = 1i − 1 j ⋅ δdiv(nj) }
while (i != n) {
    if (n % i == 0) {
        sum = sum + i;
    } else {
    }
    i = i + 1;
}

sum = ∑j = 1n−1 j ⋅ δdiv(nj) }
Postconditions: 15, 16
17.n ≥ 1 }
i = 1;
sum = 0;
while (i != n) {
    if (n % i == 0) {
        sum = sum + i;
    } else {
    }
    i = i + 1;
}

sum = ∑j = 1n−1 j ⋅ δdiv(nj) }
Sequence: 5, 16

The proof has several parts, structured around the application of the Rule of Iteration in step 14.

3. Shortcomings

The proof system examined in this chapter has several shortcomings that are worth noting.

3.1. Assumptions about expressions

In this proof system, we have treated expressions as if they could be incorporated directly into assertions' mathematical expressions. Unfortunately, integer arithmetic in many languages does not correspond to mathematical arithmetic: Languages such as C have a maximum integer beyond which arithmetic is invalid. This is a significant shortcoming of the proof of the factorial program's correctness: Although our “proof” shows that it operates correctly, in fact it behaves incorrectly when n is even moderately large (like 25), due to problems with overflow.

For such languages, a complete proof of a program's correctness would involve a proof that overflow is impossible. Such additional details adds new degrees of complexity to the proof. The complexity is even worse in programs involving floating-point arithmetic.

3.2. Partial and total correctness

Another shortcoming of this proof system is that it works only for demonstrating partial correctness. A proof is partial if it assumes that loops terminate. For an example of a shortcoming of partial correctness, consider the following program intended to compute n!.

fact = 1;
i = 1;
while (i != n) {
}

The same invariant we examined for our earlier factorial program works for this loop, too: Since nothing happens in executing the loop's body, nothing happens that changes the invariant's truth. We could easily establish a partial correctness proof for this program, even though it clearly does not work.

Total correctness requires, in addition to partial correctness, a proof that each loop terminates. For the factorial program, we would need to argue that each time through the loop, i starts out below n and becomes closer to n with each iteration; thus, it must eventually reach n. This is where we would bring in the fact that n ≥ 1, which we ignored in our prior proof.

3.3. Complexity

This chapter's proof system is already difficult to use; the added complexity of handling computer arithmetic, total correctness, and a more complete language only increase the difficulty of proving a program correct. Because of this complexity, it is extremely rare that anybody produces a complete proof of a program's correctness.

The backward-proof technique provides some hope that much of the proving process can be mechanized. Researchers have made substantial progress in alleviating the work of program proofs, But the technology isn't close to mainstream acceptance.

That doesn't mean that program proofs are irrelevant in practice, however. It helps for a programmer to occasionally think when writing some confusing code: If I were proving this, how would the argument go? This can be particularly helpful in reasoning about a difficult loop: You can think about what the invariant might be. A formal argument isn't necessary, but some awareness of the invariant can help in formulating the program and in catching any missing pieces.

4. Additional problems

Following are a few more small programs that you could try proving to gain additional practice.

  1. y ≥ 0 }
    z = 0;
    i = 0;
    while (i != y) {
        z = z + x;
        i = i + 1;
    }

    z = x ⋅ y }
  2. n ≥ 1 }
    a = 0;
    b = 1;
    i = 1;
    while (i != n) {
        c = a + b;
        a = b;
        b = c;
        i = i + 1;
    }

    b = fibn }
  3. n ≥ 2 }
    i = 2;
    while (n % i != 0) {
        i = i + 1;
    }
    b = (i == n);

    b = isPrime(n) }
  4. x ≥ 0 }
    z = 0;
    b = x;
    i = y;
    while (i != 0) {
        if (i % 2 == 0) {
            i = i / 2;
        } else {
            z = z + b;
            i = (i - 1) / 2;
        }
        b = 2 * b;
    }

    z = x ⋅ y }