Let’s get started with a code example

When learning a new programming language, it’s usually very helpful to see some example code in that language.

Example 1. Comparing C and OCaml factorial programs. Note that the factorial function in mathematics is undefined for negative arguments. Our implementation (incorrectly) returns 1 for these input values.
factorial.c - C program
#include <stdio.h>
/* product of integers from 1 to n */
int factorial(int n) {
    int ans = 1;
    for (int i = 2; i <= n; i++)
        ans *= i;
    return ans;
}

int main() {
    printf("%i", factorial(10));
}
factorial.ml — OCaml equivalent
(* product of integers from 1 to n *)
let rec factorial n =
  if n <= 1 then
    1
  else
    factorial (n-1) * n

let () =
  Printf.printf "%i" (factorial 10)

How to compile and run the program?

$ ocamlopt -o factorial factorial.ml
$ ./factorial
3628800

The compilation command above uses the native code compiler ocamlopt. There is also a byte code compiler ocamlc, which is useful if you want to debug your code, and an interactive REPL ocaml, which is usually referred as toplevel. There is also a new community-developed toplevel that provides more features called utop.

ocaml

interactive toplevel (REPL) (official documentation)

ocamlc

byte code compiler

ocamlopt

native code compiler

utop

a new more feature-rich toplevel (REPL)

In the beginning, we will be using a toplevel, ocaml or utop, for running our code interactively, and later we will switch to compilation. However, it is good to know from the start that all of these options are available for building and running your programs. Moreover, there is also a possibility to compile OCaml to JavaScript with js_of_ocaml and BuckleScript compilers, and although discussing them in depth would probably go beyond the scope of this class, you might want to use them in the future.

There are many powerful tools in the modern OCaml toolchain. For this course, in addition to the compilers, we will also be using the following programs:

  • opam package manager. See the docs on how to install it on your own computer.
    Using opam, install ocamlfind and alcotest packages.

  • ocamlfind, a tool for compiling programs with dependencies (see docs).

  • alcotest, a simple unit testing library.

The textbook also describes how to use ocamlbuild building system and OUnit unit testing library, feel free to use them too, but we will be using ocamlfind and alcotest instead in this course.

Expressions

Open terminal and start an interactive OCaml toplevel session. (We use a Unix tool rlwrap to make it more user-friendly, alternatively, if you have installed utop, you can use it instead of the official toplevel, then you don’t need rlwrap.)

$ rlwrap ocaml

or

$ utop

We can use toplevel as a calculator, for instance here we compute several arithmetic expressions:

        OCaml version 4.07.1

# 1 + 2 ;;
- : int = 3

# 10 + 3 * 5 ;;
- : int = 25

# (10 + 3) * 5 ;;
- : int = 65

Notice that we need to put double semicolon ;; after each expression to tell the program process our input. This double semicolon is not required in actual programs.

Table 1. Some integer operators. See more in the standard library documentation.

+

addition

-

subtraction and unary negation

*

multiplication

/

integer division (remainder discarded)

mod

remainder

Think of a functional program as an expression, like (1 + 2) * (20 / 3), which gets evaluated the same way as arithmetic expressions get computed in mathematics:

    (1 + 2) * (20 / 3)
--> (3) * (20 / 3)
--> (3) * (6)
--> 18

There is no statements in functional programming, only expressions. Even if then else conditions are expressions:

# 50 + (if 1 < 2 then 200 else 300) * 5 ;;
- : int = 1050

Parentheses are used to control the order of expression evaluation. Compare to:

# 50 + if 1 < 2 then 200 else 300 * 5 ;;
- : int = 250
# if 1 < 2 then
    (if 100>20 then 5 else 10) + 15
  else
    145 ;;
- : int = 20
Table 2. Polymorphic comparison operators. See more in the standard library documentation.

<

less

>

greater

<=

less or equal

>=

greater or equal

=

equal

<>

not equal

All comparison operators return a value of type bool, which is either true or false.

In OCaml, there are also operators == and !=, which determine physical equality of objects (that is, checking that they are located at the same place in the computer memory, and so are physically the same). There are very few situations when you might want to use those operators.

Instead, use = and <> for checking equality and inequality.

Name binding

We define x equal to 123 and then define a function square that returns its argument squared:

# let n = 123 ;;
val n : int = 123

# n + n ;;
- : int = 246

# let square x = x * x ;;
val square : int -> int = <fun>

# square 12 ;;
- : int = 144

# square n ;;
- : int = 15129

Define a function max_of_two that takes two arguments and returns the one that is larger:

# let max_of_two x y =
    if x > y then
      x
    else
      y ;;
val max_of_two : 'a -> 'a -> 'a = <fun>

# max_of_two 11 222 ;;
- : int = 222

# max_of_two 17 15 ;;
- : int = 17

# square (max_of_two 10 25) ;;
- : int = 625
Operator Precedence

As a rule of thumb, remember that in OCaml, the order of expression evaluation is as follows:

1

Function application

2

Unary operators (unary +, -, not, etc.)

3

Binary operators (*, /, binary +, -, <, > &&, ||, etc.)

4

if then else and other control flow constructs

Because of that,

  • max_of_two 40 20 + 100, functions are applied before operators, evaluating to 140, not 120.

  • max_of_two (square 5) (4 + 10), using functions and operators inside function arguments require parentheses.

  • In particular, max_of_two (-5) 10, negated function arguments require parentheses.

  • if 1 > 0 then 100 else 200 + square 50, the plus is inside the else clause, (200 + square 50), so the expression is evaluating to 100, not 2600.

For details, see the operator precedence and associativity table in the reference manual.

Keeping declarations in a file

Instead of typing declarations in the command line, let’s put them in a file, which we can call, for example, square_max.ml:

square_max.ml
let n = 123

let square x = x * x

let max_of_two x y =
  if x > y then
    x
  else
    y

Now, open toplevel again. The program we just wrote can be loaded in the toplevel using directive #use:

$ rlwrap ocaml
        OCaml version 4.07.1

# #use "square_max.ml" ;;
val n : int = 123
val square : int -> int = <fun>
val max_of_two : 'a -> 'a -> 'a = <fun>

# n + 50 ;;
- : int = 173

# square 12 ;;
- : int = 144

# square (5 + 2) ;;
- : int = 49

# max_of_two (square 5) (-4) ;;
- : int = 25

Type inference in OCaml

Why the type of the function square is int → int, whereas for the function max_of_two, it is 'a → 'a → 'a?

In OCaml, types are inferred by the compiler if you don’t specify them yourself. The type inference algorithm is using the fact that each previously defined operator or function already has a type.

Since the multiplication operator * has the type int → int → int (takes two integers and returns an integer), the square function must have the type int → int to conform to the type of *.

On the other hand, the > operator is polymorphic, which means it can compare not only integers, but also floats, strings, and any other compound data types. The polymorphism of the > operator is expressed in its type: 'a → 'a → bool (here, 'a is a type variable, which indicates that the operator arguments can be of any type). Because of that, the type inference algorithm deduces that the function max_of_two is also polymorphic, and has the type 'a → 'a → 'a.

Dynamic typing and static typing

Exercises

max_of_three x y z

the maximum of three arguments

is_even x

check if the argument is even

is_odd x

check if the argument is odd

Table 3. Boolean operators. See more in the standard library documentation.

&&

logical and

||

logical or

not

logical negation

is_leap_year y

check if y is a leap year. The logic of the decision is as follows:

if (year is not divisible by 4) then (it is a common year)
else if (year is not divisible by 100) then (it is a leap year)
else if (year is not divisible by 400) then (it is a common year)
else (it is a leap year)

Recursive declarations

Recursion

Recursive function declarations must start with let rec instead of simple let:

let rec factorial n =
  if n <= 1 then
    1
  else
    factorial (n-1) * n

Exercises

sum_range a b

The sum of all integers in the range from a to b (that is, for all axb ).
Example: sum_range 1 5 should be equal to 1 + 2 + 3 + 4 + 5 = 15.

🐤 click to open

let rec sum_range a b =
  if a > b then
    0  (* base case: empty range *)
  else
    sum_range a (b-1) + b

prod_range a b

The product of all integers in the range from a to b

🐤 click to open

let rec prod_range a b =
  if a > b then
    1
  else
    prod_range a (b-1) * b

num_digits x

The number of (decimal) digits in the integer x. (Should work correctly for all integers.)

🐤 click to open

let rec num_digits n =
  if abs n < 10 then
    1
  else
    num_digits (n/10) + 1

sum_digits x

The sum of all (decimal) digits of x

🐤 click to open

let rec sum_digits n =
  if abs n < 10 then
    abs n
  else
    sum_digits (n/10) + abs (n mod 10)

Without using function abs:

let rec sum_digits n =
  if n < 0 then
    sum_digits (-n)
  else
    if n < 10 then
      n
    else
      sum_digits (n/10) + (n mod 10)

nth_digit i x

i-th (decimal) digit of the integer x. Assume digits are indexed left-to-right, starting at i=1. The function should return 0 for invalid values of i.

Examples:

nth_digit 1 7025 = 7
nth_digit 2 7025 = 0
nth_digit 3 7025 = 2
nth_digit 4 7025 = 5
nth_digit 1 0 = 0
nth_digit 1 (-7025) = 7
nth_digit 3 (-7025) = 2
nth_digit (-1) 7025 = 0
nth_digit 0 7025 = 0
nth_digit 5 7025 = 0

binom n k

Binomial coefficient C(n,k), implemented recursively using Pascal’s identity.

Since this mathematical function is normally defined only for 0 ≤ k ≤ n, make the function return zero for other values of n and k. This extension of the definition is consistent with Pascal’s identity and the meaning of the coefficient (for instance, there is 0 ways to select 10 items out of 9 available).

Examples:

binom 5 0 = 1
binom 5 1 = 5
binom 5 2 = 10
binom 5 100 = 0

🐤 click to open

A simpler solution that does not handle impossible cases of, e.g. k < 0 or k > n:

(*  (n, k) = (n-1,k-1) + (n-1,k)   *)
let rec binom n k =
  if k = 0 then
    1  (* Left edge of Pascal's tringle *)
  else if k = n then
    1  (* Right edge of Pascal's triangle *)
  else
    binom (n-1) (k-1) + binom (n-1) k

A more complex solution that handles impossible cases:

let rec binom n k =
  if k < 0 || k > n then 0
  else if k = 0 || k = n then 1
  else binom (n-1) (k-1) + binom (n-1) k

Or:

let rec binom n k =
  if k < 0 || k > n then 0
  else if k = 0 && n = 0 then 1
  else binom (n-1) (k-1) + binom (n-1) k

Functions are values. Anonymous functions

The following two function definitions are equivalent:

let square1 x = x * x

let square2 = (fun x -> x * x)

Functions are values, just like numbers. (fun x -> x * x) is a valid expression for a square function, and we save it in the variable square2.

Example:

# let square2 = (fun x -> x * x) ;;
val square2 : int -> int = <fun>

# square2 12 ;;
- : int = 144

Moreover, we can apply (fun x -> x * x) directly to an argument, and it will return that argument squared:

# (fun x -> x*x) 7 ;;
- : int = 49

Such functions that are created without giving them a name are called anonymous functions.

# (fun a b -> a < b) 100 200 ;;
- : bool = true

# (fun a b c -> a + b + c) 10 20 30 ;;
- : int = 60

# 5 + (fun a b c -> a + b + c) 10 20 30 * 2 ;;
- : int = 125

The computational model of functional programming (which is called lambda calculus) is based on simple substitution: you plug in arguments, then simplify. Repeat until you are done (you still have to pay attention to the order of evaluation, just like in arithmetic, * goes before + and -, and so on, according to the operator precedence order).

The example above gets evaluated as follows:

    5 + (fun a b c -> a + b + c) 10 20 30 * 2

--> 5 + (10 + 20 + 30) * 2

--> 5 + 60 * 2

--> 5 + 120

--> 125

Since anonymous functions cannot refer to themselves by name, it seems that they should not be able to establish recursive computation on their own.

This is not actually the case. It is possible to use so called fixed-point combinators, such as Y combinator, or a self-application U combinator to implement recursion using only anonymous functions. This is not a very practical approach for implementing recursion, but it does have some use cases (1) (2) besides its theoretical value in lambda calculus.

A minimal example with Y combinator:
(* Here, we define Y combinator using a let-rec name binding.
   It is also possible to define it as an anonymous function
   (see the link above) but that's a bit more cumbersome in
   a typed functional language like OCaml. *)
# let rec y f x = f (y f) x ;;
val y : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>

(* Factorial of 5 *)
# y (fun g x -> if x = 0 then 1 else x * (g (x-1))) 5 ;;
- : int = 120

You can try to evaluate this expression on paper by doing substitutions. You will see that it works.