Recursive variant types
Recursive variant types, in combination with pattern matching, provide the means for creating tree-like recursive data structures.
Natural numbers
Every natural number is either a zero or the successor of a natural number:
type nat = Z | S of nat
let rec add a b =
match b with
| S bb -> add (S a) bb
| Z -> a
add (S (S Z)) (S Z) --> (S (S (S Z)))
Lists
Every list is either empty denoted as Nil
, or the pair of a head and a tail denoted as Cons(h, t)
:
We can define a list of integers as follows:
type lst = Nil | Cons of (int * lst)
|
|
|
|
|
|
|
|
The length of a list can be then computed as follows:
let rec length ls =
match ls with
| Nil -> 0
| Cons (_, tl) -> 1 + length tl
Equivalently, since we match the last argument, we can also write it as:
let rec length = function
| Nil -> 0
| Cons (_, tl) -> 1 + length tl
length (Cons (1, Cons (2, Cons (3, Nil)))) --> 3
Polymorphic (parameterized) type declarations
The above type declaration for lst
is called monomorphic, since it can represent only objects of one specific type, lists of integers.
That declaration can be made polymorphic, that is, being able to represent lists of elements of any type, not only integer lists.
It can be done by using a type variable 'a
to denote the type of the individual elements. Then the type of the whole list is denoted as 'a lst
:
type 'a lst = Nil | Cons of ('a * 'a lst)
# Cons (3.14159265, Cons (2.71828, Cons (1.41421, Nil))) ;; - : float lst = Cons (3.14159265, Cons (2.71828, Cons (1.41421, Nil))) # Cons ("Apple", Cons ("Orange", Cons ("Pineapple", Nil))) ;; - : string lst = Cons ("Apple", Cons ("Orange", Cons ("Pineapple", Nil))) # Cons ([1], Cons ([2;3], Cons ([4;5;6], Nil))) ;; - : int list lst = Cons ([1], Cons ([2; 3], Cons ([4; 5; 6], Nil))) # Cons (Nil, Cons (Nil, Cons (Nil, Nil))) ;; - : 'a lst lst = Cons (Nil, Cons (Nil, Cons (Nil, Nil)))
Exercise: Write a list concatenation function for the type 'a lst
:
concat (Cons ("mouse", Cons ("cat", Nil))) (Cons ("dog", Nil))
--> (Cons ("mouse", Cons ("cat", Cons ("dog", Nil))))
🐤 click to open
let rec concat ls1 ls2 =
match ls1 with
| Cons (hd, tl) -> Cons(hd, concat tl ls2)
| Nil -> ls2
Exercise: Write a list reversing function for the type 'a lst
:
reverse (Cons ("mouse", Cons ("cat", Cons ("dog", Nil))))
--> (Cons ("dog", Cons ("cat", Cons ("mouse", Nil))))
🐤 click to open
let reverse ls =
let rec rev acc ls =
match ls with
| Nil -> acc
| Cons (hd, tl) -> rev (Cons(hd, acc)) tl
in
rev Nil ls
Binary Search Tree
- A binary search tree (BST) is a binary tree with the following representation invariant:
-
For any node n, every node in the left sub-tree of n has a value less than n’s value, and every node in the right sub-tree of n has a value greater than n’s value.
BSTs can be defined as a variant type, where each internal node contains a key, a left sub-trees, and a right sub-tree; and all leaf nodes are empty:
type 'a tree =
Node of ('a * 'a tree * 'a tree)
| Leaf
Alternatively, we can use variants with records:
type 'a tree =
Node of {key : 'a; left : 'a tree; right : 'a tree}
| Leaf
(** [is_empty t] returns [true] iff the tree [t] is empty. *)
let is_empty t =
match t with
| Leaf -> true
| Node _ -> false
There are also shorter ways to define this function:
|
Exercise: Membership check.
mem k t
returns true
if and only if the tree t
contains k
.
mem 10 Leaf --> false
mem 20 (Node (10, Leaf, Node (20, Leaf, Leaf))) --> true
Or 🐤 click to open
(** [mem k t] is [true] iff [k] is in [t] *)
let rec mem k = function
| Leaf -> false
| Node (q, l, r) -> (q = k) || if k < q then mem k l else mem k r
(** [mem k t] is [true] iff [k] is in [t] *)
let rec mem k = function
| Leaf -> false
| Node (q, l, r) -> (q = k) || (k < q && mem k l) || (mem k r)
Exercise: Insertion.
insert k t
returns a new tree with the key k
inserted into t
.
insert 10 Leaf --> Node (10, Leaf, Leaf)
. 10 --> / \ . .
insert 15 (Node (10, Leaf, Node (20, Leaf, Leaf)))
--> (Node (10, Leaf, Node (20, Node (15, Leaf, Leaf), Leaf)))
10 10 / \ / \ . 20 --> . 20 / \ / \ . . 15 . / \ . .
🐤 click to open
(** [insert k t] inserts [k] into tree [t] *)
let rec insert k t =
match t with
| Leaf -> Node (k, Leaf, Leaf)
| Node (q, l, r) ->
if q = k then t
else if k < q then Node (q, insert k l, r)
else Node (q, l, insert k r)
Exercise: In-order traversal.
iter f t
traverses the tree in-order, calling f k
for each key k
.
The in-order traversal first check the left sub-tree, then the node itself, then the right sub-tree. |
Let us construct a bigger input tree:
let animals = Leaf |> insert "cat" |> insert "bird" |> insert "horse" |> insert "dog"
Since insertion is using <
to find the place for a new node, strings get arranged in lexicographic order:
(Node ("cat", Node("bird", Leaf, Leaf), Node ("horse", Node ("dog", Leaf, Leaf), Leaf)))
cat / \ bird horse / \ / \ . . dog . / \ . .
iter print_endline animals
bird cat dog horse
🐤 click to open
(** [iter f t] iterates over tree [t], computing [f k] for each key [k] *)
let rec iter f = function
| Leaf -> ()
| Node (q, l, r) -> iter f l; f q; iter f r
Exercise: Map.
map f t
computes a new BST containing f k
for each key k
from the original tree t
.
Can we simply map each key in the tree, keeping the original tree structure? No, we cannot. If we do so, then the tree may not preserve the BST invariant:
cat 3 / \ / \ bird horse --> 4 5 / \ / \ / \ / \ . . dog . . . 3 . / \ / \ . . . .
We have to build a new tree by inserting each new element f k
one by one (multiple insertions don’t add multiple entries in the tree):
map String.length animals
--> Node (4, Node (3, Leaf, Leaf), Node (5, Leaf, Leaf))
4 / \ 3 5 / \ / \ . . . .
(Depending on the tree traversal order in your function, you might get a differently-balanced tree.)
This implementation goes over all tree elements in-order and inserts them in a new tree.
The Alternatively, to avoid multiple local variables: Pre-order traversal is more likely to produce better-balanced trees than in-order traversal.
It can be implemented by inserting the current node key first, before inserting the left and the right sub-trees: 🐤 click to open
(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
let rec mp acc = function
| Leaf -> acc
| Node (k, l, r) ->
let acc2 = mp acc l in
let acc3 = insert (f k) acc2 in
mp acc3 r
in
mp Leaf t
mp
helper function is actually a fold function that traverses the tree in-order, while accumulating the
new tree in the accumulator variable acc
.(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
let rec mp t acc =
match t with
| Leaf -> acc
| Node (k, l, r) ->
acc |> mp l |> insert (f k) |> mp r
in
mp t Leaf
(** [map f t] returns a new tree containing [f k] for each [k] in [t] *)
let map f t =
let rec mp t acc =
match t with
| Leaf -> acc
| Node (k, l, r) ->
acc |> insert (f k) |> mp l |> mp r
in
mp t Leaf
Exercise: Deletion.
delete "cat" animals
cat dog / \ / \ bird horse --> bird horse / \ / \ / \ / \ . . dog . . . . . / \ . .
delete "cat" (animals |> insert "fish" |> insert "fox")
cat dog / \ / \ bird horse --> bird horse / \ / \ / \ / \ . . dog . . . fish . / \ / \ . fish . fox / \ / \ . fox . . / \ . .
Also see this StackOverflow discussion.
🐤 click to open
(** [find_min t] find the smallest element in the tree [t]. Raises [Not_found] if the tree is empty. *)
let rec find_min = function
| Leaf -> raise Not_found
| Node (q, Leaf, _) -> q
| Node (_, l, _) -> find_min l
(** [delete k t] removes the element [k] from the tree [t] *)
let rec delete k = function
| Leaf -> Leaf (* alternatively, raise Not_found *)
(* descend: *)
| Node (q, l, r) when k < q -> Node (q, delete k l, r)
| Node (q, l, r) when k > q -> Node (q, l, delete k r)
(* node is found, delete: *)
| Node (_, Leaf, r) -> r
| Node (_, l, Leaf) -> l
| Node (_, l, r) ->
let m = find_min r in
Node (m, l, delete m r)
Question: What are the time complexities of mem
, insert
, iter
, map
, and delete
?
🐤 click to open
map
is quadratic in the number of nodes, O(n2). The other functions are linear, O(n).
Red-black trees
Simple BST will always give linear time complexity for most operations in the worst case, when its elements were inserted in ascending or descending order.
To get logarithmic time, one needs to use balanced tree data structures, such as AVL trees, 2-3-4 trees, or red-black trees. For red-black trees see:
-
The textbook section 8.6.
-
The specific implementation of the red-black tree insertion operation discussed in the textbook is due to (Chris Okasaki, 1999) "Red-Black Trees in a Functional Setting". This article is very easy to read and it discusses the topic in more detail than the textbook, also comparing it to the imperative implementation of this data structure. Chris Okasaki is also the author of the book "Purely functional data structures".
-
The deletion operation is significantly harder and messier than Okasaki’s insertion: See (Germane and Might, 2014) (Appel, 2011), (Kahrs, 2001).