Sunday, December 23, 2012

Solving the Countdown problem in Haskell

Introduction

One of the challenges in the British game show Countdown was to find a way to create a number using other numbers and basic mathematical operators. For example, given the set {2,3,4,5} generate 14, using +, -, *, /. A solution is 5 *4 - 2 * 3. Note that each  number should be used exactly once.

An interesting problem is writing an application that lets the computer do all the work for you, so that it finds a solution for a given target number, or maybe even all the solutions!

Problem analysis

To find all the ways to combine numbers to yield a certain result we have to generate all the possible ways of  combining the numbers (subproblem 1) and then filter out the ones that have our desired result (subproblem 2). But how do we generate all possibilites? What we really want is to generate all the possible valid mathematical expressions with our numbers. We recognize that these expressions can be written as a tree:

For example, $$5 \cdot 6 - 2 \cdot 3$$:

Tree representation of mathematical expression
Since the operators we have chosen all have two elements, our expressions are binary trees. Thus, this means that our algorithm has to generate all possible trees.

For this task I have chosen Haskell, since it is by far the easiest language to work with trees, as we will soon see. If you are interested in another example why Haskell - or other functional languages - are very convenient to work with if you have trees, check out my fully fledged  spl compiler. This project is a compiler for a custom programming language, written in Haskell in only 900 lines! I have written my own parser, abstract syntax tree and Intel backend using the same techniques you'll see below. If you are interested in this project, let me know, and I may write a tutorial on this.

Implementation

First, we have to define the grammar of our mathematical expressions:
data Exp = Num Int | Plus Exp Exp | Sub Exp Exp | Mul Exp Exp | Div Exp Exp deriving Eq

Our expression is defined as being a number, or any of our binary operators. The elements (branches) of each operator are mathematical expressions themselves as well. You can immediately see that this recursive definition of a tree structure is much, much smaller than it would have been in for example Java or C.

In an object oriented language the solution for such a structure would be to make a base class Exp, that has 5 subclasses. Each subclass then has two variables of type Exp, except for the Num subclass which has an integer variable.

Before moving to the main algorithm, I'll show some code for convenience functions. The first implements a Show function for our custom data type, making a tree easier to read:
instance Show Exp where
 show (Num n) = show n
 show (Plus a b) = "(" ++ show a ++ "+" ++ show b ++ ")"
 show (Sub a b) = "(" ++ show a ++ "-" ++ show b ++ ")"
 show (Mul a b) = "(" ++ show a ++ "*" ++ show b ++ ")"
 show (Div a b) = "(" ++ show a ++ "/" ++ show b ++ ")"

If you now type
show (Plus (Num 5) (Mul (Num 6) (Num 2)))

in the Haskell interpreter, the result would be (5 + (6 * 2)). In the code above you can see pattern matching. If you are not familiar with functional languages, you can read about this feature here.

We also need a function that actually calculates the result of our abstract expression. Using pattern matching again:
reduce :: Exp -> Int
reduce (Num n) = n
reduce (Plus a b) = reduce a + reduce b
reduce (Sub a b) = reduce a - reduce b
reduce (Mul a b) = reduce a * reduce b
reduce (Div a b) = reduce a `div` reduce b

Now we can think about the algorithm itself. We want to use all the numbers and use them only once, so we have to generate all the possible ways to select an element from a list and also keep track of which items we have not used yet. That's what the following function is for:
-- list should have at least 2 elements
split :: [a] -> [a] -> [(a, [a])]
split [] _ = []
split (x:xs) acc = [(x, acc ++ xs)] ++ (split xs (acc ++ [x]))
Split turns a list [1,2,3] into [(1, [2,3]), (2,[1,3]), (3, [1,2]). Thus, when we select a binary operator, we use the first number of the tuple as its left branch and we can use the list of remaining number to generate a subtree for the right branch:
pos :: [Int]-> [Exp]
pos [n] = [Num n] -- one number left? no binary operators possible
pos n = concatMap allpos subtree
 where
 subtree = map (\(a,r) -> (Num a, pos r)) (split n [])
 allpos (a,b) = concatMap (\e -> pos' a e) b
 pos' a b = [Plus a b, Sub a b, Sub b a, Mul a b, Div a b, Div b a]
Let's look at the subtree function first. This generates all possible subtrees where the left branch is a given number. The function pos' gives all the possible binary operators where a and b are branches. As you can see, for the subtraction and division operator, the left and right branches interchanged is also a valid option. That's because 1 - 2 != 2 - 1.

Finally, the allpos function generates puts all these possiblities in a list. Of course, some of these expressions may be invalid due to the rules of the game. For example, it's not allowed to have a negative value or a fractional values in any of the subresults. Another thing that should be filtered is duplicates: 1 + 2 is the same expression as 2 + 1, so it should appear only once. That's why we have the following filter function:

-- remove illegal attempts and attempts that are the same
valid :: Exp -> Bool
valid (Num n) = True
valid (Plus a b) = valid a && valid b && reduce a > reduce b -- only allow one
valid (Sub a b) = valid a && valid b && reduce a - reduce b >= 0 -- subsolutions should be positive
valid (Mul a b) = valid a && valid b && reduce a > reduce b
valid (Div a b) = valid a && valid b && reduce b > 0 && reduce a `mod` reduce b == 0
And finally we have a function that receives a list of numbers and the target number as input and that returns all the possible expressions:
findVal :: [Int] -> Int -> [Exp]
findVal n res = filter (\e -> valid e && reduce e == res) (pos n)

And that's the solution to the problem! If you run this application, you will see that contrary to what you may expect, very little memory is used. This is because Haskell does lazy evaluation, so the lists are actually not computed beforehand but each element is evaluated one at a time.

There are some optimzations, which I leave as an exercise for the reader: for example, you can reject wrong expressions much earlier. If you generate a binary operator with two numbers, say Plus (Num 2) (Num 4), you can reject it immediately (because we only allow 4 + 2 and not 2 + 4).

And that's it for today. I hope you've learned something new and interesting :)