Sunday, June 24, 2012

Google Code Jam in Scala : Recycled Numbers

Recycled Numbers is the first problem in the code jam where we have to think about computational complexity. Any solution that uses for(n<-a to b; m<-a to b) is O(n^2), thus will not finish in time for the extreme a=1,b=2000000 case. And believe me, Google stress test your code a testcase with the extreme case.  So first we have to create a function that can recycle 12345, and output 51234. Basic math, the only thing needed is somehow to compute the multiplier for the last digit, 10000.
def tens(n:Int):Int = if (n < 10) 1 else tens(n/10)*10
def recycle(n:Int, t:Int): Int = n/10 + n%10*t

Now we have the two basic functions, but how do we generate all recycles? What you need is the iterate function of the various container classes. It helps you compute multiple applications of a function, and gives you back a series of x,f(x),f(f(x)),f(f(f(x))),.... Since we don't know beforehand how many steps do we want to take, let's use Stream:
for(m<-Stream.iterate(12345)(recycle(_,10000))) {
  println(m)
}
12345
51234
45123
34512
23451
12345
51234
45123
34512
23451
12345
...

This is cool, we have an infinite list of recycles, but we only need them until they start looping, and anyway, the first one is 12345, we don't want that either. Fortunately the API has all the methods we need, so at the end, we arrive at this function:
def recycled(n:Int):Stream[Int] = {
  val t = tens(n)
  Stream.iterate(n)(recycle(_,t)).tail.takeWhile(_!=n)
}

Now, how do we use our cool new method to generate all possible (n,m) pairs for given a and b? You use for loops, of course. And once you've got a list of all possible pairs, the length of the list is the answer you seek.
def solveFunc(a:Int, b:Int):Int = {
  val pairs = for(
      n <- a to b;
      m <- recycled(n);
      if (n < m && m <= b))
    yield (n,m)
  pairs.length
}

Neat, eh? Easy to read and understand, not much you can screw up. As a comparison here's the imperative solution to the problem:
def solveImp(a:Int, b:Int) = {
  var count = 0
  var n = a
  while(n <= b) {
    val tens = {
      var y = n/10
      var m = 1
      while(y>0) { y/=10; m*=10 }
      m
    }
    var temp = n/10 + (n%10)*tens
    while (temp != n) {
      if (temp > n && temp <=b) {
          count+=1
      }
      temp = temp/10 + (temp%10)*tens
    }
    n+=1
  }
  count
}

Much more harder to understand. It has one big advantage, though: its runtime is about 4% of the functional solution. And uses constant memory, while solveFunc uses up O(pairs.length). The question is can we speed up the functional solution? And why does it use up that much memory?

To understand the memory issue, you have to know that for comprehension is nothing more than syntactic sugar over map, flatMap and filter. Thus, when you compute pairs.length, you're forcing the whole dataset to materialise:
def solve(a:Int, b:Int):Int = {
  val pairs = (a to b).flatMap(n=> {
    recycled(n)
    .filter(
        m=>n < m && m <= b
    )
    .map(
        m=>(n,m)
    )
  })
  pairs.length
}
But since you don't actually need all that data, you might as well use a view. In scala you can switch back and forth between lazy and instant evaluation by using view and force.

Generating a the recycled numbers with stream is also very slow. Stream with its lazy evaluation and memoisation takes a lot of CPU cycles. If we use List, we would be much faster, but of course you can't have an infinite list. Fortunately since there is an upper limit b=2000000, we know that we don't need an infinite list of recycles, because we know that we arrive back at the original after 7 iterations. Computing some excess recycles is so much less work than the Stream overhead, that it's worth switching.

Last, there's no need to generate the pairs at all; we're dropping them on the floor anyway, and only count the length.

With these, the optimized-but-still-pure-functional solution looks like this:
def recycled(n:Int) = {
  val t = tens(n)
  List.iterate(n,7)(recycle(_,t)).tail.takeWhile(_!=n)
}

def solve(a:Int, b:Int) = {
  val recycles = for(
      n <- (a to b).view;
      ms = recycled(n))
    yield ms.count(m=> n < m && m <= b)
  
  recycles.sum
}

It uses constant memory, and the run time is 20% of the original. That's still 5 times slower than the imperative solution, but well, you can't have everything.

No comments: