Thunk and delayed computation

Delayed computation is a code put in an anonymous function:

(fun () -> ... )

Such anonymous function is commonly called thunk. It represents delayed computation, because the code it carries is not evaluated until you actually call it.

For example, one can define:

let t =
  (fun () ->
    let pi = 4.0 *. atan 1.0 in
    let area = pi *. 15.0 *. 15.0 in
    Printf.printf "log: computed area %g\n" area;
    Some area
  )
in

and then call it elsewhere:

t()

Sequence (Delayed functional iterator)

If you want to create a sequence of elements, but don’t want to generate the whole thing right away (maybe because it is too long, possibly even infinite, or because it involves a lot of computation), you can use lazy evaluation, or you can use delayed computation. We will consider delayed computation case.

Define two types, 'a seq and 'a node:

type 'a seq = unit -> 'a node
and 'a node =
    Nil
  | Cons of ('a * 'a seq)

These are mutually recursive types, so we use and keyword to define them both simultaneously.

  • 'a node represents a node of our delayed sequence — (head and tail) or empty.

  • 'a seq is the first node of the sequence stored as a delayed computation.

Iterate over a sequence of integers and print its elements:

let rec print seq =
  match seq () with
  | Cons (x, rest) ->
      Printf.printf "%i " x;
      print rest
  | Nil -> ()

In the following examples we will use notation (a; b; c) to denote sequence of elements a, b, c . This is not real OCaml syntax, it is only used here to describe what is stored in the sequence.

Infinite constant sequence

const n  -->  (n; n; n; ...)
let rec const n =
  (fun () ->
    Cons (n, const (n))
  )

Infinite sequence of natural numbers (0; 1; 2; 3; …​) (not perfect, since it will eventually reach integer overflow):

nats  -->  (0; 1; 2; 3; ...)

🐤 click to open

let nats =
  let rec seq_ints n =
    (fun () ->
      Cons (n, seq_ints (n+1))
    )
  in
  seq_ints 0

Take n first elements from seq:

take 10 nats  -->  (0; 1; 2; 3; 4; 5; 6; 7; 8; 9)

🐤 click to open

let rec take n seq =
  (fun () ->
    if n > 0 then
      match seq() with
      | Cons (x, rest) -> Cons (x, take (n-1) rest)
      | Nil -> Nil
    else
      Nil
  )

Join two sequences

join (take 4 nats) (take 3 (const 0))  -->  (0; 1; 2; 3; 0; 0; 0)

join nats nats |> take 5  -->  (0; 1; 2; 3; 4)
(* note that joining two infinite sequence does not create infinite loop *)

🐤 click to open

let rec join seq1 seq2 =
  (fun () ->
    match seq1 () with
    | Cons (x, rest) -> Cons(x, join rest seq2)
    | Nil -> seq2 ()
  )

Module Seq

In OCaml, delayed sequence type is implemented in the standard library in the module Seq. All other data types, List, Array, Map, Set, Hashtbl, allow transformation to and from Seq.t.

For example:

module M = Map.Make(String)

let () =
  let fruits = M.empty
    |> M.add "apple" 10
    |> M.add "orange" 7
    |> M.add "grapes" 18
    |> M.add "pineapple" 41
    |> M.add "cherry" 12
  in

  fruits
  |> M.to_seq
  |> Seq.iter (fun (k,v) ->
    Printf.printf "%s %i\n" k v
  );

  print_newline();

  fruits
  |> M.to_seq
  |> Seq.filter (fun (k,v) -> v > 15)
  |> Seq.iter (fun (k,v) ->
    Printf.printf "%s %i\n" k v
  )
apple 10
cherry 12
grapes 18
orange 7
pineapple 41

grapes 18
pineapple 41

Note that the elements are sorted by the key (the fruit name). This is because the function M.to_seq produces the sequence of its key-value pairs sorted by the key (in-order iteration).

Memoization

Memoized function caches its return values so that in the future, instead of recomputing it again, it simply looks up for the cached answer.

We will use hash tables (module Hashtbl) for caching. This is an imperative data structure with the following operations:

Hashtbl.create n

creates a new table, where n determines the initial table size.

Hashtbl.replace ht k v

adds a new key-value pair (k, v) in the table ht. If a binding to the key k already exists, it gets replaced with the new value.

Hashtbl.find_opt ht k

searches table ht for the value corresponding to the key k. Returns option, None or Some v.

For example, cached Fibonacci numbers:

let fib =
  (* make a hash table for caching *)
  let ht = Hashtbl.create 10 in
  Hashtbl.replace ht 0 0;
  Hashtbl.replace ht 1 1;

  let rec f n =
    match Hashtbl.find_opt ht n with
    | Some v -> v  (* return answer it found *)
    | None ->      (* otherwise: *)
        let v = f (n-1) + f (n-2) in  (* compute *)
        Hashtbl.replace ht n v;       (* save *)
        v                             (* return *)
  in
  f
fib 50  -->  12586269025

Higher-order functions for memoization

We can define higher order functions, mem, and mem_rec to turn any function into its memoized version.

Non-recursive memoization

let mem f =
  let ht = Hashtbl.create 10 in
  ( fun x ->
      try Hashtbl.find ht x with
      | Not_found ->
          let res = f x in
          Hashtbl.replace ht x res;
          res
  )

Example: Memoized square roots of integer numbers:

let sqrt_of_int = mem (fun x -> sqrt (float x))

sqrt_of_int 123 --> 11.0905365064094177

Recursive memoization

let mem_rec f =
  let ht = Hashtbl.create 10 in
  let rec self =
    ( fun x ->
        try Hashtbl.find ht x with
        | Not_found ->
            let res = f self x in
            Hashtbl.replace ht x res;
            res
    )
  in
  self

Example: Memoized Fibonacci numbers:

let fib =
  mem_rec
    (fun self n ->
      if n < 2 then n
      else self (n-1) + self (n-2)
    )

fib 50  -->  12586269025

Additional resources and reading:

Monads

Consider the following program that reads the filename from the first command line argument, opens that file, and reads an integer number from it:

let get_arg n =
  if Array.length Sys.argv <= n then None
  else Some (Sys.argv.(n))

let open_file filename =
  try
    Some (open_in filename)
  with
  | Sys_error _ -> None

let input_int_opt chan =
  try
    let ib = Scanning.from_channel chan in
    Some (Scanf.bscanf ib "%d" (fun x -> x))
  with
  | _ -> None

let read_int_from_file () =
  match get_arg 1 with
  | None -> None
  | Some filename ->
      begin match open_file filename with
      | None -> None
      | Some chan ->
          begin match input_int_opt chan with
          | None -> None
          | Some int -> Some int
          end
      end

let () =
  match read_int_from_file () with
  | Some x -> Printf.printf "%i\n" x
  | None -> Printf.printf "Error\n"

If the integer is read successfully, the program prints it out:

$ ./a.out good-file
1234

Otherwise, if something goes wrong (there is no command-line argument, the file does not exist, or there is no integer in the file), the program prints Error:

$ ./a.out bad-file
Error

The program works fine, but the function read_int_from_file contains repetitive pattern matchings, and that can be improved.

Define a helper wrapping type 'a m

It is implemented as an option type with two additional operations, return and >>= (also called "bind"):

type 'a m = 'a option

(*type:  'a -> 'a m  *)
let return x = Some x

(*type:  'a m -> ('a -> 'b m) -> 'b m  *)
let (>>=) am f =
  match am with
  | Some a -> f a
  | None -> None
sNpepPO

Using this type (called Maybe monad), the read_int_from_file function reduces to a much cleaner program:

let read_int_from_file () =
  get_arg 1 >>= open_file >>= input_int_opt

Or somewhat more verbosely:

let read_int_from_file () =
  get_arg 1 >>= fun filename ->
  open_file filename >>= fun chan ->
  input_int_opt chan

According to monad laws, (m >>= f) >>= g is equivalent to m >>= (fun x -> f x >>= g), so the two above programs are equivalent, after computing get_arg 1, first the function open_file is applied, then the function input_int_opt.

Monads are very useful in pure languages such as Haskell, often for representing imperative computation (state, IO, etc). However, they are also employed in non-pure languages such as OCaml. Examples include (see link):

  • Concurrency (see Lwt and Async libraries)

  • Non-deterministic choice

  • First-class continuations

  • Ambivalent choice and backtracking

  • Parsers

  • Polymorphic state

  • Linear resources

Additional resources and reading:

Lambda calculus (Theoretical foundation of functional programming)

Lambda calculus is a model of computation introduced by Alonzo Church in 1930s. It describes computation based on anonymous functions and their application. It lays the foundation of functional programming and many other fields of theoretical computer science and mathematical logic.

Basics

Consider an OCaml anonymous function:

(fun x -> (fun y -> x y))

It takes argument x, then argument y, and then applies x to y. If we apply this function to two functions (fun w → w) and u, we will evaluate the whole expression to u:

     (fun x -> (fun y -> x y)) (fun w -> w) u
-->  (fun y -> (fun w -> w) y) u
-->  (fun w -> w) u
-->  u

If you understand how the above evaluation works, you understand how Lambda calculus works. It turns out, all computations can be expressed by using this type of function definition (abstraction) and function application (substitution).

To rewrite the OCaml program in the example above as a Lambda calculus term, we have to do simple syntactic substitution:

  • replace fun with λ (greek lambda), and

  • replace with . (period)

So we get:

(λ x . (λ y . x y))

Following the OCaml example above, we can apply this lambda term to (λ w . w) and u, and it will reduce to u, exactly the same way as the OCaml program did:

(λ x . (λ y . x y)) (λ w . w) u
(λ y . (λ w . w) y) u
(λ w . w) u
u

The interesting question is: How to express familiar computation primitives using only lambda terms (anonymous function and their application)?

Infinite loop

Consider the following lambda term (λ x . x x). If you apply it to any other term M, you will get:

(λ x . x x) M
M M

Well, things get interesting if that term M is (λ x . x x), so we get:

(λ x . x x) (λ x . x x)
(λ x . x x) (λ x . x x)
(λ x . x x) (λ x . x x)
→ …​ it will keep doing the same reduction indefinitely.

Such indefinite loop program cannot be run in OCaml, because the type checker will complain about the type of x:

# (fun x -> x x) (fun x -> x x) ;;
Error: This expression has type 'a -> 'b
       but an expression was expected of type 'a
       The type variable 'a occurs inside 'a -> 'b

In untyped lambda calculus, this is not a problem, and (λ x . x x) (λ x . x x) is a perfectly fine lambda term.

Local name binding let in as lambda term

Consider the following example of let-in name binding:

let x = y in
f x

I can be rewritten as

(fun x -> f x) y

So, the corresponding lambda term is:

(λ x . f x) y

Additional resources and reading:

Functional Programming in other languages

Aspects of functional programming:

  • First-class functions: functions can be saved in a variable, passed as an argument, or returned from a function.

  • Referential transparency (pure functions): function producing no side-effects and returning the same result every time you call it.

  • Tail-call optimization

  • Immutable data: persistent data types

  • Pattern matching

  • Variant types: such as Option type

Other aspects that are important in functional programming, but not limited to functional paradigm:

  • Lexical scoping and closures

  • Garbage collection

Modern general-purpose languages such as Kotlin, Rust, Swift, Clojure, Scala use many functional programming concepts. Old programming languages, such as Java, and C++, acquire functional-programming features too.

JavaScript

JavaScript was created as a mix of Scheme and Self with a Java-style syntax. It had first-class functions from the very beginning, however its type system is too permissive to develop safe and robust code. It resulted in development of better functional languages such as TypeScript and Clojure (ClojureScript) that compile into JS, and new libraries such as Immutable.js, lodash/fp, and Ramda to provide other functional programming features in JavaScript.

OCaml compiles to JS too, using compilers js_of_ocaml (also called JSOO) and BuckleScript. Programming language Reason ML is an alternative syntax for OCaml that is designed for better interoperability with JavaScript, relying on BuckleScript compiler.