Variant types

H6o8WD6

Many programming tasks deal with heterogeneous data

  • Directions: Left, Right, Up, Down.

  • Numbers that can be either an Int or a Float.

  • Geometric shapes: Square, Circle, Triangle, etc. Each such shape can have different properties (radius for the circle, but width and height for a rectangle).

  • Colors in various color models (RGB, CMYK, etc.)

  • UI events or possible actions available to a user.

  • Various map landmarks.

  • Types of food.

  • etc.

From a programming language, we need:

  • Convenient means for declaring such heterogeneous data types.

  • The ability to define operations on these data types.

  • Type-checking by the compiler.

In C, one can use enum and union to model data can assume one of several possible values.

A type for storing four directions:

enum Dir {LEFT, RIGHT, UP, DOWN};

int main () {
  enum Dir dir = LEFT;
}

A type for storing either int or double value:

union Number {int i; double d;};

int main() {
  union Number num;
  num.i = 15; // if used i, it is int
              // if used d, it is double
}

See this link on how union and struct types can be used to implement a variant type in C. union types are somewhat tricky to use correctly, so in C++, one would often use a hierarchy of classes instead.

In object-oriented programming, one can use classes for similar purposes. For example, a parent class Number with child classes Float and Int, or a parent class Shape with child classes Circle, Square, and Rectangle.

Introducing variant types

In functional programming, the situations described above can be modeled with variant types.

A type representing four directions and a simple example of its usage:

type dir = Left | Right | Up | Down
let () =
  let dir = Left in

  match dir with
  | Left -> print_endline "Going left"
  | Right -> print_endline "Going right"
  | _ -> print_endline "Going up or down"

A function reversing direction:

let reverse d =
  match d with
  | Left -> Right
  | Right -> Left
  | Up -> Down
  | Down -> Up

A geometric shape type, and a function area that computes the area of a shape:

type shape =
    Square of float             (** width *)
  | Rectangle of (float*float)  (** width and height *)
  | Circle of float             (** radius *)
let area sh =
  match sh with
  | Square w -> w *. w
  | Rectangle (w,h) -> w *. h
  | Circle r ->
      let pi = 4.0 *. atan 1.0 in
      pi *. r *. r

A variant number type that stores int or float values:

type number = Int of int | Float of float

let () =
  let v = Int 15 in

  match v with
  | Int x -> Printf.printf "It is int %i\n" x
  | Float x -> Printf.printf "It is float %g\n" x

A function that adds two variant numbers defined above. It automatically promotes int to float when necessary:

let sum n1 n2 =
  match n1, n2 with
  | Int x, Int y -> Int (x+y)
  | Int x, Float y -> Float (float x +. y)
  | Float x, Int y -> Float (x +. float y)
  | Float x, float y -> Float (x +. y)

We are using function float, it is the same as float_of_int, converting integer to float.

The option type is also a variant type. Its values can be either None or Some v, where v is any OCaml values. It is defined as follows:

type 'a option = None | Some of 'a

Here, 'a is a type variable indicating that the type option is polymorphic. We will consider polymorphic variant types later, it is a topic for another lecture.

A more complex example with geometric shapes

Let us implement a condition that checks whether or not a shape can fit into another shape:

(** outer radius (circumradius) of a shape *)
let outer_radius sh =
  match sh with
  | Circle r -> r
  | Square w -> w /. sqrt(2.0)
  | Rectangle (w,h) -> 0.5 *. sqrt (w*.w +. h*.h)

(** inner radius *)
let inner_radius sh =
  match sh with
  | Circle r -> r
  | Square w -> 0.5 *. w
  | Rectangle (w,h) -> 0.5 *. (min w h)

(** [fits x y] if shape [x] can fit inside [y] *)
let rec fits x y =
  match x, y with
  | Square w1, Square w2 -> w1 <= w2
  | Circle r, _ -> r <= inner_radius y
  | _, Circle r -> outer_radius x <= r
  | Rectangle (w1,h1), Rectangle (w2,h2) ->
      w1 <= w2 && h1 <= h2 ||
      w1 <= h2 && h1 <= w2
      (* not counting possible diagonal placement *)
  | Rectangle _, Square w -> fits x (Rectangle (w, w))
  | Square w, Rectangle _ -> fits (Rectangle (w, w)) y

Note that rectangles can also fit diagonally, with the condition described here, however we only consider orthogonal placement of rectangles in our code.

Example: A calculator language (without input parsing)

type expr =
    Add of expr * expr
  | Sub of expr * expr
  | Mul of expr * expr
  | Div of expr * expr
  | V of int

let rec eval e =
  match e with
  | V x -> x
  | Add (e1, e2) -> eval e1 + eval e2
  | Sub (e1, e2) -> eval e1 - eval e2
  | Mul (e1, e2) -> eval e1 * eval e2
  | Div (e1, e2) -> eval e1 / eval e2

let () =
  let e = Mul (V 5, Sub (V 10, V 4)) in (*  5*(10-4)  *)

  Printf.printf "%i \n" (eval e)  (* would print 30 *)

This type declarations is recursive: type expr refers to itself in its definition (expr can be Add of (expr*expr), for instance). Such recursive types are useful for defining complex recursive data structures.

Pretty printer for calculator expressions (the function should return a fully-parenthesized string corresponding to the expression).

string_of_expr (V 100) --> "100"
string_of_expr (Add (V 7, V 8))) --> "(7 + 8)"
string_of_expr (Mul (V 5, Sub (V 10, V 4))) --> "(5 * (10 - 4))"

🐤 click to open

let rec string_of_expr e =
  match e with
  | V x -> string_of_int x
  | Add (x, y) -> "(" ^ string_of_expr x ^ " + " ^ string_of_expr y ^ ")"
  | Sub (x, y) -> "(" ^ string_of_expr x ^ " - " ^ string_of_expr y ^ ")"
  | Mul (x, y) -> "(" ^ string_of_expr x ^ " * " ^ string_of_expr y ^ ")"
  | Div (x, y) -> "(" ^ string_of_expr x ^ " / " ^ string_of_expr y ^ ")"

Functional languages with algebraic datatypes (tuples and variants) are particularly well-suited for implementing other programming languages. See more on this topic, for example, here:

Reading a file

The following program reads the file file.txt from the current folder and prints it out on the screen:

(** [get_line ic] reads a line from the input channel [ic] *)
let get_line ic =
  try
    Some (input_line ic)
  with
  | End_of_file -> None

let () =
  (* open input channel *)
  let ic = open_in "file.txt" in

  (* read and print the file content line by line *)
  let rec read () =
    match get_line ic with
    | Some line ->
        Printf.printf "%s\n" line;
        read ()
    | None -> ()
  in
  read ();

  (* close input channel *)
  close_in ic

MiniMake

We are going to implement a small utility MiniMake that works similarly to Make and can be used for building programs with separate compilation.

Useful references:

MiniMakefile format

MiniMake will be reading the rules from the file called MiniMakefile. This file looks a lot like a normal Makefile (in fact, the most basic Makefiles will be compliant MiniMakefiles):

MiniMakefile
program : utils.o main.o
	echo "#Linking#"
	g++ -o program utils.o main.o

utils.o : utils.cpp utils.h  # Comment
	echo "#Compiling utils.o#"
	g++ -c utils.cpp

main.o : main.cpp utils.h
	echo "#Compiling main.o#"
	g++ -c main.cpp

# Remove .o files and executable
clean :
	rm *.o program
CkmIHtF

MiniMakefile

is a sequence of rules

rule

consists one target line and 0 or more recipe lines

target line

consists of one target followed by a colon ':' and 0 or more prerequisites separated by whitespace.

target

string that does not contain whitespace or colon

prerequisite

string that does not contain whitespace

recipe line

starts with a whitespace (' ' or '\t') followed by a shell command to be executed

comment

starts with '#' and can occur only on target lines or take up an entire line on its own. There can be no comments in recipe lines, however, because '#' can be part of a command that we want to execute.

In the MiniMakefile example above, the echo commands are unnecessary for building the program, instead, they demonstrate that a recipe can take several lines. They also show that '#' can occur in a recipe line and should not be discarded as a comment.

If we type minimake, it will execute the commands for the first target from the MiniMakefile (in the above example, the first target is called program), building all the prerequisites along the way, just like make would do:

$ ./minimake
echo "#Compiling utils.o#"
#Compiling utils.o#
g++ -c utils.cpp
echo "#Compiling main.o#"
#Compiling main.o#
g++ -c main.cpp
echo "#Linking#"
#Linking#
g++ -o program utils.o main.o

Parsing a MiniMakefile

When parsing a MiniMakefile, our first step is to scan every line and determine what type this line is:

minimake.ml
type line =
    Target of (string * (string list))
  | Recipe of string
  | Empty
  | Error (* non-compliant, cannot be parsed *)

Target line (contains a target and a list of prerequisites), Recipe line (contains a shell command), Empty line, and Error line. We need the Error case for non-compliant lines that cannot be parsed.

Classifying a line:

  • If the line starts with a whitespace (' ' or '\t') → recipe line

  • Otherwise: Remove comments ('#'), and then:

    • if target : prerequisitestarget line

    • else if empty → empty line

    • else → error line

classify "  g++ -o program utils.o main.o"
  --> Recipe "g++ -o program utils.o main.o"

classify "program : utils.o main.o"
  --> Target ("program", ["utils.o"; "main.o"])

classify "clean:  #Some comment"
  --> Target ("clean", [])

classify "clean"
  --> Error

Let us implement this classify function. We are going to use regular expressions (module Str) for that.

minimake.ml
let remove_comment line =
  try
    let i = String.index line '#' in
    String.sub line 0 i
  with
  | Not_found -> line

let rx_recipe = Str.regexp "^[ \t]+\\(.*\\)$"
let rx_target = Str.regexp "^\\([^: \t]+\\)[ \t]*:\\(.*\\)$"
let rx_space = Str.regexp "[ \t]+"

let classify line =
  if Str.string_match rx_recipe line 0 then
    let command = Str.matched_group 1 line in
    Recipe command
  else begin
    let line = String.trim (remove_comment line) in
    if String.length line = 0 then
      Empty
    else if Str.string_match rx_target line 0 then
      let tgt = Str.matched_group 1 line in
      let prereqs = Str.matched_group 2 line in
      let prereq_ls = Str.split rx_space prereqs in
      Target (tgt, prereq_ls)
    else
      Error
  end

Explanation of Str regular expressions:

  • [ \t]+ matches 1 or more whitespaces (' ' or '\t')

  • .* matches 0 or more characters of any kind. Used to match recipe commands and lists of prerequisites.

  • \\( and \\) are brackets marking a matched groups we want to extract

  • [^: \t]+ matches 1 or more character other than ':', ' ', or '\t'. Used to match targets

  • ^ at the front and $ at the end require matching from the beginning to the end of the line, so the entire line should match not just some part of it

The function can be tested in toplevel:

# #load "str.cma" ;;      (* need to explicitly load Str library *)
# #use "minimake.ml" ;;   (* load the function *)

# classify "program : utils.o main.o" ;;
- : line = Target ("program", ["utils.o"; "main.o"])

# classify "  g++ -o program utils.o main.o" ;;
- : line = Recipe "g++ -o program utils.o main.o"

Let us write a simple program now that reads all lines from the file MiniMakefile:

minimake.ml
let get_line ic =
  try
    Some (input_line ic)
  with
  | End_of_file -> None

(* returns all lines from the file as a list of strings, arranged in the reverse order *)
let read_all filename =
  let ic = open_in filename in

  let rec read acc =
    match get_line ic with
    | Some line -> read (line :: acc)
    | None ->
        close_in ic; (* close input channel *)
        acc (* return accumulator *)
  in
  read []

let () =
  (* read the file and clasify each line *)
  let ls = List.map classify (read_all "MiniMakefile") in

  (* print how many lines is read *)
  Printf.printf "%i lines read" (List.length ls)

If all the above declarations are combined in the file minimake.ml, we can compile it as follows (note the explicit mention of the library file str.cmxa):

ocamlopt -o minimake str.cmxa minimake.ml

Alternatively, compiling with the byte-code compiler ocamlc (need the byte-code library str.cma):

ocamlc -o minimake str.cma minimake.ml

Alternatively, compiling with the ocamlfind tool:

ocamlfind ocamlopt -o minimake -package str -linkpkg minimake.ml