Recursive variant types. Binary search trees.

Recursive variant types

Recursive variant types, in combination with pattern matching, provide the means for creating tree-like recursive data structures.

Natural numbers

Every natural number is either a zero or the successor of a natural number:

type nat = Z | S of nat

let rec add a b =
  match b with
  | S bb -> add (S a) bb
  | Z -> a

add (S (S Z)) (S Z)  -->  (S (S (S Z)))

Lists

Every list is either empty denoted as Nil, or the pair of a head and a tail denoted as Cons(h, t):

We can define a list of integers as follows:

type lst = Nil | Cons of (int * lst)

Nil

[]

Cons (1, Nil)

[1]

Cons (1, Cons (2, Nil))

[1; 2]

Cons (1, Cons (2, Cons (3, Nil)))

[1; 2; 3]

The length of a list can be then computed as follows:

let rec length ls =
  match ls with
  | Nil -> 0
  | Cons (_, tl) -> 1 + length tl

Equivalently, since we match the last argument, we can also write it as:

let rec length = function
  | Nil -> 0
  | Cons (_, tl) -> 1 + length tl

length (Cons (1, Cons (2, Cons (3, Nil))))  -->  3

Polymorphic (parameterized) type declarations

The above type declaration for lst is called monomorphic, since it can represent only objects of one specific type, lists of integers.

That declaration can be made polymorphic, that is, being able to represent lists of elements of any type, not only integer lists. It can be done by using a type variable 'a to denote the type of the individual elements. Then the type of the whole list is denoted as 'a lst:

type 'a lst = Nil | Cons of ('a * 'a lst)

# Cons (3.14159265, Cons (2.71828, Cons (1.41421, Nil))) ;;
- : float lst = Cons (3.14159265, Cons (2.71828, Cons (1.41421, Nil)))

# Cons ("Apple", Cons ("Orange", Cons ("Pineapple", Nil))) ;;
- : string lst = Cons ("Apple", Cons ("Orange", Cons ("Pineapple", Nil)))

# Cons ([1], Cons ([2;3], Cons ([4;5;6], Nil))) ;;
- : int list lst = Cons ([1], Cons ([2; 3], Cons ([4; 5; 6], Nil)))

# Cons (Nil, Cons (Nil, Cons (Nil, Nil))) ;;
- : 'a lst lst = Cons (Nil, Cons (Nil, Cons (Nil, Nil)))

Exercise: Write a list concatenation function for the type 'a lst:

concat (Cons ("mouse", Cons ("cat", Nil))) (Cons ("dog", Nil))
   --> (Cons ("mouse", Cons ("cat", Cons ("dog", Nil))))

🐤 click to open

let rec concat ls1 ls2 =
  match ls1 with
  | Cons (hd, tl) -> Cons(hd, concat tl ls2)
  | Nil -> ls2

Exercise: Write a list reversing function for the type 'a lst:

reverse (Cons ("mouse", Cons ("cat", Cons ("dog", Nil))))
   -->  (Cons ("dog", Cons ("cat", Cons ("mouse", Nil))))

🐤 click to open

let reverse ls =
  let rec rev acc ls =
    match ls with
    | Nil -> acc
    | Cons (hd, tl) -> rev (Cons(hd, acc)) tl
  in
  rev Nil ls

Binary Search Tree

A binary search tree (BST) is a binary tree with the following representation invariant:: For any node n, every node in the left sub-tree of n has a value less than n’s value, and every node in the right sub-tree of n has a value greater than n’s value.

BSTs can be defined as a variant type, where each internal node contains a key, a left sub-trees, and a right sub-tree; and all leaf nodes are empty:

type 'a tree =
    Node of ('a * 'a tree * 'a tree)
  | Leaf

Alternatively, we can use variants with records:

type 'a tree =
    Node of {key : 'a; left : 'a tree; right : 'a tree}
  | Leaf

(** [is_empty t] returns [true] iff the tree [t] is empty. *)
let is_empty t =
  match t with
  | Leaf -> true
  | Node _ -> false

There are also shorter ways to define this function:

let is_empty = function
  | Leaf -> true
  | _ -> false

let is_empty t = (t = Leaf)

let is_empty = (=) Leaf
  (* operator as function and its partial application *)

Exercise: Membership check.

mem k t returns true if and only if the tree t contains k.

mem 10 Leaf  -->  false
mem 20 (Node (10, Leaf, Node (20, Leaf, Leaf)))  -->  true

🐤 click to open

(** [mem k t] is [true] iff [k] is in [t] *)
let rec mem k = function
  | Leaf -> false
  | Node (q, l, r) -> (q = k) || if k < q then mem k l else mem k r

(** [mem k t] is [true] iff [k] is in [t] *)
let rec mem k = function
  | Leaf -> false
  | Node (q, l, r) -> (q = k) || (k < q && mem k l) || (mem k r)

Exercise: Insertion.

insert k t returns a new tree with the key k inserted into t.

insert 10 Leaf  -->  Node (10, Leaf, Leaf)

 
     .                 10
             -->      /  \
                     .    .

insert 15 (Node (10, Leaf, Node (20, Leaf, Leaf)))
  -->  (Node (10, Leaf, Node (20, Node (15, Leaf, Leaf), Leaf)))

 
    10                 10
   /  \               /  \
  .   20      -->    .   20
     /  \               /  \
    .    .             15   .
                      /  \
                     .    .

🐤 click to open

(** [insert k t] inserts [k] into tree [t] *)
let rec insert k t =
  match t with
  | Leaf -> Node (k, Leaf, Leaf)
  | Node (q, l, r) ->
      if q = k then t
      else if k < q then Node (q, insert k l, r)
      else Node (q, l, insert k r)

Exercise: In-order traversal.

iter f t traverses the tree in-order, calling f k for each key k.

The in-order traversal first check the left sub-tree, then the node itself, then the right sub-tree.

Let us construct a bigger input tree:

let animals = Leaf |> insert "cat" |> insert "bird" |> insert "horse" |> insert "dog"

Since insertion is using < to find the place for a new node, strings get arranged in lexicographic order:

(Node ("cat", Node("bird", Leaf, Leaf), Node ("horse", Node ("dog", Leaf, Leaf), Leaf)))

 
        cat
      /     \
   bird     horse
   /  \     /   \
  .    .   dog   .
          /   \
         .     .

iter print_endline animals

bird
cat
dog
horse

🐤 click to open

(** [iter f t] iterates over tree [t], computing [f k] for each key [k] *)
let rec iter f = function
  | Leaf -> ()
  | Node (q, l, r) -> iter f l; f q; iter f r

Exercise: Map.

map f t computes a new BST containing f k for each key k from the original tree t.

Can we simply map each key in the tree, keeping the original tree structure? No, we cannot. If we do so, then the tree may not preserve the BST invariant:

 
        cat                     3
      /     \                /     \
   bird     horse    -->    4       5
   /  \     /   \          / \     / \
  .    .   dog   .        .   .   3   .
          /   \                  / \
         .     .                .   .

We have to build a new tree by inserting each new element f k one by one (multiple insertions don’t add multiple entries in the tree):

map String.length animals
  --> Node (4, Node (3, Leaf, Leaf), Node (5, Leaf, Leaf))

 
        4
      /   \
     3     5
    / \   / \
   .   . .   .

(Depending on the tree traversal order in your function, you might get a differently-balanced tree.)

🐤 click to open

(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
  let rec mp acc = function
    | Leaf -> acc
    | Node (k, l, r) ->
         let acc2 = mp acc l in
         let acc3 = insert (f k) acc2 in
         mp acc3 r
  in
  mp Leaf t

This implementation goes over all tree elements in-order and inserts them in a new tree. The mp helper function is actually a fold function that traverses the tree in-order, while accumulating the new tree in the accumulator variable acc.

Alternatively, to avoid multiple local variables:

(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
  let rec mp t acc =
    match t with
    | Leaf -> acc
    | Node (k, l, r) ->
        acc |> mp l |> insert (f k) |> mp r
  in
  mp t Leaf

Pre-order traversal is more likely to produce better-balanced trees than in-order traversal. It can be implemented by inserting the current node key first, before inserting the left and the right sub-trees:

(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
  let rec mp t acc =
    match t with
    | Leaf -> acc
    | Node (k, l, r) ->
        acc |> insert (f k) |> mp l |> mp r
  in
  mp t Leaf

Exercise: Deletion.

delete "cat" animals

 
        cat                       dog
      /     \                   /     \
   bird     horse     -->    bird     horse
   /  \     /   \            /  \     /   \
  .    .   dog   .          .    .   .     .
          /   \
         .     .

delete "cat" (animals |> insert "fish" |> insert "fox")

 
        cat                       dog
      /     \                   /     \
   bird     horse     -->    bird     horse
   /  \     /   \            /  \     /   \
  .    .   dog   .          .    .  fish   .
          /   \                     /  \
         .   fish                  .   fox
             /  \                      / \
            .   fox                   .   .
                / \
               .   .

🐤 click to open

(** [find_min t] find the smallest element in the tree [t]. Raises [Not_found] if the tree is empty. *)
let rec find_min = function
  | Leaf -> raise Not_found
  | Node (q, Leaf, _) -> q
  | Node (_, l, _) -> find_min l

(** [delete k t] removes the element [k] from the tree [t] *)
let rec delete k = function
  | Leaf -> Leaf (* alternatively, raise Not_found *)
  (* descend: *)
  | Node (q, l, r) when k < q -> Node (q, delete k l, r)
  | Node (q, l, r) when k > q -> Node (q, l, delete k r)
  (* node is found, delete:  *)
  | Node (_, Leaf, r) -> r
  | Node (_, l, Leaf) -> l
  | Node (_, l, r) ->
      let m = find_min r in
      Node (m, l, delete m r)

Also see this StackOverflow discussion.

Question: What are the time complexities of mem, insert, iter, map, and delete?

🐤 click to open

map is quadratic in the number of nodes, O(n²). The other functions are linear, O(n).

Red-black trees

Simple BST will always give linear time complexity for most operations in the worst case, when its elements were inserted in ascending or descending order.

To get logarithmic time, one needs to use balanced tree data structures, such as AVL trees, 2-3-4 trees, or red-black trees. For red-black trees see:

The textbook section 8.6.
The specific implementation of the red-black tree insertion operation discussed in the textbook is due to (Chris Okasaki, 1999) "Red-Black Trees in a Functional Setting". This article is very easy to read and it discusses the topic in more detail than the textbook, also comparing it to the imperative implementation of this data structure. Chris Okasaki is also the author of the book "Purely functional data structures".
The deletion operation is significantly harder and messier than Okasaki’s insertion: See (Germane and Might, 2014) (Appel, 2011), (Kahrs, 2001).