Arjun Krishna Babu bio photo

Arjun Krishna Babu

Python. Machine Learning. Systems. Open source.

Email LinkedIn Github

(An updated version of this article can be found on my newer blog here).

Let’s implement a binary tree in Haskell.

data Tree a = Leaf a
  | Node (Tree a) (Tree a)
  deriving (Show, Eq)

The above defines a Tree that takes a value of type a. Because we have not explicitly specified any datatype (such as an Int or a [Char], for instance) and instead defined it in terms of a which can stand for any type, this tree is capable of handling values of any datatype.

It has 2 constructors:

  1. Constructor for a tree with just a Leaf.
  2. Constructor for a tree with two subtrees, each of which maybe
    • another tree (with further subtrees), or
    • a leaf.

We’re deriving Show (line 3) so that we can pretty-print our binary-tree.

It’s evident that defining a tree in Haskell is way more easier than doing the same in Java, C++, or even Python.

Let’s go ahead and create elements using this tree. Save the code above into a file (say my_tree.hs), and load it using the ghci interpreter.

$ ghci my_tree.hs
GHCi, version 8.0.1:  :? for help
[1 of 1] Compiling Main             ( my_tree.hs, interpreted )
Ok, modules loaded: Main.

So far so good. Let’s create two trees (using the Leaf value constructor) and see their type.

*Main> a = Leaf "cat"
*Main> b = Leaf "rabbit"

*Main> :type a
a :: Tree [Char]
*Main> :type b
b :: Tree [Char]

Both a and b are infact Tree that stores [Char] (a.k.a String).

Let’s create a new Node that contains a and b defined above as subtrees.

*Main> n = Node a b
*Main> :t n
n :: Tree [Char]
*Main> n
Node (Leaf "cat") (Leaf "rabbit")

Creating a Node was simple – we passed in leaves a and b we created earlier into the Node constructor.

The last line in the snippet displays the contents of the tree n – it has two subtrees, each of which is a Leaf. We were able to display the contents in this way because we had derived the Show typeclass.

Note that n is a tree that stores data of type [Char] because all its subtrees store data of type [Char]. If we had instead tried to create a tree with different datatypes, ghci would complain, as the following example demonstrates:

*Main> p_num = Leaf 20
*Main> q_str = Leaf "Linda"

*Main> :t p_num
p_num :: Num a => Tree a
*Main> :t q_str
q_str :: Tree [Char]

*Main> r_bad = Node p_num q_str

<interactive>:15:14: error:
    • No instance for (Num [Char]) arising from a use of ‘p_num’
    • In the first argument of ‘Node’, namely ‘p_num’
      In the expression: Node p_num q_str
      In an equation for ‘r_bad’: r_bad = Node p_num q_str

The problem was that p_num and q_str stores values of different types, and we can’t create a Tree that combines subtrees which store values of two different types.

Tree Map

Let’s create a treeMap function that maps over a tree, converting values in the leaves to another value.

treeMap should take as arguments:

  1. A function that converts a value of type a (ie., a value stored in a Leaf or a Node) to a value of type b; a and b need not necessarily be the same.
  2. A Tree to work on.

Add the following code to my_tree.hs we were working on earlier.

treeMap :: (a -> b) -> (Tree a) -> (Tree b)
treeMap f (Leaf a) = Leaf (f a)
treeMap f (Node t1 t2) = Node (treeMap f t1) (treeMap f t2)

Line #2 handles the case when the Tree is a Leaf, and line #3 handles the case when the Tree is a Node.

First up, reload my_tree.hs into the interpreter.

*Main> :r
[1 of 1] Compiling Main             ( my_tree.hs, interpreted )
Ok, modules loaded: Main.

Let’s add some more nodes to the tree we created earlier.

*Main> m = Leaf "horse"
*Main> p = Node m n
*Main> p
Node (Leaf "horse") (Node (Leaf "cat") (Leaf "rabbit"))

Now, let’s run treeMap on the p to get a new tree with the length of each of the leaves.

*Main> treeMap length p
Node (Leaf 5) (Node (Leaf 3) (Leaf 6))

Leaves of the new tree contains the string length of values in the original tree p. The structure of the tree remains unchanged.

Since the leaves of the resultant tree stores Int (because length returns values of type Int), we will see this change reflected when we examine its type.

*Main> :t (treeMap length p)
(treeMap length p) :: Tree Int

Tree Fold

Let’s now implement a function that folds a tree into a single value.

treeFold should take as arguments:

  1. Function that accepts a value of type a and returns a value of type b, to operate on Leaf.
    (a -> b)
  2. Function that accepts two values of type b, and returns a third value of type b. Operates on the values the first function returns (after processing Leaf).
    (b -> b -> b)
  3. A Tree to work on.

Finally, treeFold would return a value of type b.

Append the following code to my_tree.hs:

treeFold :: (b -> b -> b) -> (a -> b) -> (Tree a) -> b
treeFold fnode fleaf (Leaf n) = fleaf n
treeFold fnode fleaf (Node t1 t2) = fnode (treeFold fnode fleaf t1) (treeFold fnode fleaf t2)

Reload my_tree.hs into the interpreter.

*Main> :r
[1 of 1] Compiling Main             ( my_tree.hs, interpreted )
Ok, modules loaded: Main.

Now, we’ll use treeFold to add the lengths of all elements stored in the leaves of tree p, which we defined earlier.

*Main> p
Node (Leaf "horse") (Node (Leaf "cat") (Leaf "rabbit"))
*Main> treeFold (+) length p

Simple as that!

See the full my_tree.hs below: