Alex Sanchez-Stern
forall x such that x is a
∀ (x: ℕ) (P : ℕ -> Prop), P(x) ->
First major theorem proven primarily in a computer
Solved after 400 years using computer-assisted proof
We can also write proofs about computer programs as mathematical artifacts
Verified compiler for low-level software
Verified operating system kernel
Verified encryption algorithm implementations
Theorem of_intu_of_int_3:
forall x,
of_intu x = sub (of_int (Int.and x ox7FFF_FFFF)) (of_int (Int.and x ox8000_0000)).
Proof.
intros.
set (hi := Int.and x ox8000_0000).
set (lo := Int.and x ox7FFF_FFFF).
assert (R: forall n, integer_representable 53 1024 (Int.signed n)).
{ intros. pose proof (Int.signed_range n).
apply integer_representable_n. smart_omega. }
unfold sub, of_int. rewrite BofZ_minus by auto. unfold of_intu. f_equal.
assert (E: Int.add hi lo = x).
{ unfold hi, lo. rewrite Int.add_is_or.
- rewrite <- Int.and_or_distrib. apply Int.and_mone.
- rewrite Int.and_assoc. rewrite (Int.and_commut ox8000_0000). rewrite Int.and_assoc.
change (Int.and ox7FFF_FFFF ox8000_0000) with Int.zero. rewrite ! Int.and_zero; auto.
}
assert (RNG: 0 <= Int.unsigned lo < two_p 31).
{ unfold lo. change ox7FFF_FFFF with (Int.repr (two_p 31 - 1)). rewrite <- Int.zero_ext_and by lia.
apply Int.zero_ext_range. compute_this Int.zwordsize. lia. }
assert (B: forall i, 0 <= i < Int.zwordsize -> Int.testbit ox8000_0000 i = if zeq i 31 then true else false).
{ intros; unfold Int.testbit. change (Int.unsigned ox8000_0000) with (2^31).
destruct (zeq i 31). subst i; auto. apply Z.pow2_bits_false; auto. }
assert (EITHER: hi = Int.zero \/ hi = ox8000_0000).
{ unfold hi; destruct (Int.testbit x 31) eqn:B31; [right|left];
Int.bit_solve; rewrite B by auto.
- destruct (zeq i 31). subst i; rewrite B31; auto. apply andb_false_r.
- destruct (zeq i 31). subst i; rewrite B31; auto. apply andb_false_r.
}
assert (SU: - Int.signed hi = Int.unsigned hi).
{ destruct EITHER as [EQ|EQ]; rewrite EQ; reflexivity. }
unfold Z.sub; rewrite SU, <- E.
unfold Int.add; rewrite Int.unsigned_repr, Int.signed_eq_unsigned. lia.
- assert (Int.max_signed = two_p 31 - 1) by reflexivity. lia.
- assert (Int.unsigned hi = 0 \/ Int.unsigned hi = two_p 31)
by (destruct EITHER as [EQ|EQ]; rewrite EQ; [left|right]; reflexivity).
assert (Int.max_unsigned = two_p 31 + two_p 31 - 1) by reflexivity.
lia.
Qed.
forall x such that x is a
∀ (x: ℕ) (P : ℕ -> Prop), P(x) ->
But first...
In this talk we'll go over just a few of them
Specifications!
(mathematical statements describing the behavior of the program on all valid inputs)
∀ x, ...
1. Structuring Proofs
For the rest of this talk we'll cover three stages of proof writing that AI can assist
2. Synthesizing Steps
3. Searching for QED
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
Proof
Specification
forall (L: List T),
sorted(isort(L)) AND
length(isort(L)) = length(L) AND
elements(isort(L)) = elements(L)
Like other proofs, computer proofs need decomposition
Proof
Specification
forall (L: List T),
sorted(isort(L)) AND
length(isort(L)) = length(L) AND
elements(isort(L)) = elements(L)
Like other proofs, computer proofs need decomposition
forall (L: List T) (x: T),
length([x] + L) = length(L) + 1
Proof
Specification
forall (L: List T),
sorted(isort(L)) AND
length(isort(L)) = length(L) AND
elements(isort(L)) = elements(L)
Helper Lemmas
Like other proofs, computer proofs need decomposition
forall (L: List T) (x: T),
sorted(L) ->
sorted(sinsert(x, L))
forall (L: List T) (x: T),
length([x] + L) = length(L) + 1
In the CompCert verified compiler
~1 top-level correctness theorem
6257 other lemmas and theorems
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
Program
Specification
∀ x, ...
∀ x, ...
∀ x, ...
∀ x, ...
Term Rewriting
Enumeration
Isaplanner, Rippling
Quickspec, Hipspec
Goal Directed
Limited Expressiveness
Expressive
Undirected
Most common prior approach,
Generalization: 20%
[1] Sivaraman, Aishwarya, Alex, Sanchez-Stern, Bretton, Chen, Sorin, Lerner, and Millstein Todd. "Data-Driven Lemma Synthesis for Interactive Proofs." . In Object-Oriented Programming, Systems, Languages & Applications. ACM SIGPLAN, 2022
LFind
65% Success Rate
~~~
~
Theorem
Partial Proof
∀ x, ...
Helper Lemmas
∀ x, ...
∀ x, ...
∀ x, ...
Stuck Goal
rev
=
a :: x
(
rev x
++
(a :: nil)
)
Generalize
Stuck Goal
...
...
=
.....
rev
(
.....
y
++
.....
z
)
rev
=
a :: x
(
rev x
++
(a :: nil)
)
Generalize
Synthesize
Stuck Goal
...
...
...
...
=
.....
rev
(
.....
y
++
.....
z
rev
=
a :: x
(
rev x
++
(a :: nil)
)
)
.....
rev
(
.....
y
++
.....
z
)
=
(rev z) ++ (rev y)
Generalize
Synthesize
Filter
Stuck Goal
...
...
...
...
=
.....
rev
(
.....
y
++
.....
z
)
(rev z) ++ (rev y)
.....
rev
(
.....
y
++
.....
z
)
=
.....
rev
(
.....
y
++
.....
z
)
=
(rev z) ++ (rev y)
rev
a :: x
(
rev x
++
(a :: nil)
)
=
Generalize
Synthesize
Filter
Rank
Candidate
Lemmas
Stuck Goal
...
...
1.
...
...
rev (y ++ z)
= (rev z) ++ (rev y)
1.
2.
3.
∀ x, ...
∀ x, ...
∀ x, ...
rev
a :: x
(
rev x
++
(a :: nil)
)
=
.....
rev
(
.....
y
++
.....
z
)
=
.....
rev
(
.....
y
++
.....
z
)
=
(rev z) ++ (rev y)
(rev z) ++ (rev y)
.....
rev
(
.....
y
++
.....
z
)
=
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
“forall (n: nat), (S n) > n” ?
induction n.
"1 > 0"?
reflexivity.
auto.
“forall (n: nat), (S n) > n” ?
?
...
...
...
...
...
...
Proof Contexts
Actions
1.23
2.34
3.45
4.56
1.23
2.34
3.45
4.56
≠
≠
forall (n: nat),
(S n) > n
induction n.
forall (n: nat), (S n) > n.
induction n.
Rango
Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez-Stern, Yuriy Brun, Joao F Ferreira, Sorin Lerner, Emily First. To appear in ICSE 2025
forall (n: nat), (S n) > n.
induction n.
Input (Proof Context)
Output (Proof Step)
forall (n: nat), (S n) > n.
induction n.
[1] Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. "Generating Correctness Proofs with Neural Networks." . In Machine Learning in Programming Languages. ACM SIGPLAN, 2020.
Input (Proof Context)
Output (Proof Step)
{
Command
Argument
forall (n: nat), (S n) > n.
induction n.
[1] Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, Yuriy Brun, and Talia Ringer. "Passport: Improving Automated Formal Verification Using Identifiers." (2022).
Input (Proof Context)
Output (Proof Step)
{
Command
Argument
Identifier Embeddings
“forall (n: nat), (S n) > n” ?
...
...
...
...
...
...
Proof Contexts
Actions
induction n.
induction n.
(S n) > n
n: nat
Theorem pred_Sn :
forall n:nat, n = pred (S n).
Proof.
intros H.
apply Pos2Nat.inj.
now rewrite id_succ.
Qed.
RAG
(with BM-25)
Without RAG: 18.6%
Proof Retrieval: + 10.4%
Premise Retrieval: + 1.0%
Theorem pred_Sn :
forall n:nat, n = pred (S n).
18.6%
29%
30%
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
90% prediction accuracy doesn’t mean much if you need to get 10 proof steps right.
Proof.
Proof.
intros.
Proof.
intros.
simplify.
Proof.
intros.
induction n.
induction n.
simplify.
Proof.
intros.
reflexivity.
reflexivity.
...
Unbounded Search Space
There are some techniques that can help us prune the search tree
It's hard to explore when you don't know where you are!
n = n
forall n,
n + 1 > n
forall n,
False
Easy
Medium
Hard
26% Shorter Proofs
in 27% Fewer Steps
Proof.
Qed.
...
...
...
...
...
Alex Sanchez-Stern, et al. "QEDCartographer: Automating formal verification using reward-free reinforcement learning." To appear at ICSE 2025
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
1. Structuring Proofs
2. Synthesizing Steps
3. Searching for QED
Emily First
Yousef Alhessi
Sorin Lerner
Zhanna Kaufman
Abhishek Varghese
Yuriy Brun
Talia Ringer
Timothy Zhou
Aishwarya Sivaraman
Todd Millstein
Bretton Chen
Ana Brendel
Machine-Checkable Proofs + Search + AI = ❤️
Rango
LFind