Lost in a Forest

This is still a work in progress.

The title might be misleading. I’m not lost, nor in a forest 🙂. In this post I’ll share some work that I’ve been doing on specifying iteration.

Cursors

Iteration, be it the traversal of some data structure or the result enumeration of some algorithm, can be conceived via small-step cursors. This concept is well-known in popular high-level languages such as Java or C++, the “for each” construct for (E x: ...) is nothing short of syntatic sugar for a cursor that might1 reach exhaustion via controlled small-steps. Cursors are typically defined via 3 functions, one of creation (create), one that tests termination (has_next), and one that produces the next element in the iteration (next).

c = create collection
while has_next c do
  x <- next c
  ...

Reasoning about Iteration

Reasoning about iteration can be done using two predicates, permitted that defines the produced values and their well-formed sequence, and complete that states an iterator’s eventual completion. For instance, regarding the iteration over a tree can be expressed via the permitted sequence of produces elements that is a prefix of the sequence that order that the nodes will be visited. The order function is an oracle.

\[\texttt{permitted}(v, \mathit{tree}) \triangleq \texttt{prefix}(v, \texttt{order}(\mathit{tree}))\]

and similarly, the completeness of an iteration can be tested by comparing the length of the visited sequence and the number of nodes in the tree.

\[\texttt{completed}(v, \mathit{tree}) \triangleq ||v|| = \texttt{count}(\mathit{tree})\]

Forests

Forests are n-ary trees, each tree node can have as many child nodes as it may want. In OCaml, we can define forests as a mutually recursive Algebraic Data-Type (ADT).

type 'a tree =
  | E
  | Node of 'a * 'a forest
and 'a forest = ('a tree) list

A tree node is either Empty or some Node that is a (value, list of children) pair. Reasoning about mutually recursive ADTs is not trivial. Take for example a function that iterates over a tree and applies a function to each of its elements.

let rec iter_tree f tree =
  match tree with
  | E -> ()                   (* do nothing *)
  | Node (a, forest) ->
     let _ =  f a in          (* apply function *)
     iter_forest f forest     (* visit its children *)

and iter_forest f forest =
  match forest with
  | [] -> ()                  (* do nothing *)
  | x :: t ->
     iter_tree f x;           (* tree recurse on current child node *)
     iter_forest f t          (* visit the rest of the list *)

It might not be so clear what our specification contracts should be, or better yet what should be decorated with them. In other words, what should be the contracts of iter_tree and iter_forest and how would they depend on each other? We can take a step back and see the iteration as a whole, and not as two independent recursive functions.

In that regard, let’s build a Cursor, in WhyML, over our tree that is able to enumerate its elements and allow us to better reason about its iteration. Firstly defining a cursor, this can be expressed as a record with 3 fields, the sequence of visited elements, the enumerated elements and its initial collection:

type cursor 'a = private {
  ghost mutable    visited : seq 'a;
  ghost         enumerated : seq 'a;
                collection : tree 'a;
}

Noting that we end up doing a pre-order visit on the tree, we can define a function that enumerates the whole elements in that same order.

let rec function elements_tree (t: tree 'a) (acc: seq 'a) : seq 'a =
    variant { t } (* t is strictly decreasing at each recursive call *)
    match t with
    | E -> acc
    | Node e r -> elements_forest r (cons e acc)
    end
with elements_forest (f: forest 'a) (acc: seq 'a) : seq 'a =
  variant { f } (* f is strictly decreasing at each recursive call *)
  match f with
    | Nil -> acc
    | Cons x r ->
      let children = (elements_tree x acc) ++ acc in
      elements_forest r children
  end

We can use this function to populate the enumerate sequence when we create the cursor:

val create (c: tree 'a) : cursor 'a
  ensures { result.collection = c }
  ensures { result.visited    = empty}
  ensures { result.enumerated = elements_tree c empty }
predicate permitted (c: cursor 'a) = forall e. mem e c.visited  -> mem e c.elements
predicate complete  (c: cursor 'a) = length c.elements = length c.visited
  1. We can define iteration over infinite streams, in which case, has_next would always yield true