Saturday, May 31, 2008

Dynamic Language Optimizations: Memoization

The real raison d'ĂȘtre for this blog is to simply record my progress and notes in my current exploration of dynamic language implementation and optimization. First up I'd like to talk about memoization, which is basically trading space for time. If we have a function or method that is not affected by external variables, i.e. it references no global values or instance data,, then we can cache the result of each call to the function, and on subsequent calls with the same input, return the cached value instead of redoing the computation.

For example, here's a naive recursive implementation of the Fibonacci series in Ruby and in C, along with execution times:

def fib(n)
if n == 0 || n == 1
n
else
fib(n-1) + fib(n-2)
end
end

36.times do |i|
puts "n=#{ i} => #{fib(i)}"
end

real 0m54.742s
user 0m54.613s
sys 0m0.077s

#include "stdio.h"

int fib(int n)
{
if (n == 0 || n == 1)
return n;
else
return fib(n-1) + fib(n-2);
}
main()
{
int i;
for (i = 0; i < 36; ++i)
printf("%d => %d\n",i,fib(i));
}

real 0m0.666s
user 0m0.663s
sys 0m0.002s


"OMG! Ruby is eighty times slower than C!", I hear you say. But I would add, for certain values of X, specifically those values that are in the set of stupid benchmarks.

So let's apply memoization to this Ruby script, a whopping two additional code lines and time it again:


def fib(n)
@cache ||= Hash.new
return @cache[n] if @cache.has_key? n
if n == 0 || n == 1
@cache[n] = n
else
@cache[n] = fib(n-1) + fib(n-2)
end
end

36.times do |i|
puts "n=#{ i} => #{fib(i)}"
end

real 0m0.009s
user 0m0.005s
sys 0m0.004s


Hmmm. With a little less naivete, our intrepid Ruby programmer has made his version run over seventy times faster than the naive C version. I suppose we could improve the C version:

#include "stdio.h"
#include "stdlib.h"

int fib(int n)
{
static int length = 0;
static int* cache = 0;

if (length < n) {
cache = malloc(sizeof(int)*(n+1)*2);
length = (n + 1) * 2;
}
if (n == 0 || n == 1)
return n;
else if (cache[n] == 0)
cache[n] = fib(n-1) + fib(n-2);

return cache[n];
}
main()
{
int i;

for (i = 0; i < 36; ++i)
printf("%d => %d\n",i,fib(i));
}
real 0m0.004s
user 0m0.001s
sys 0m0.003s


Looks like it can help C also. But the improvement to Ruby is more drastic. In fact, Ruby has a gem called memoize that allows you to wrap methods you wish to cache the result of.

Wouldn't it be nice if our compiler or interpreter could recognize when it could cache method returns automatically? In fact there has been a bit or more
of research on this.

Frist Psot!

Just what Teh Internets needs, another blog (well and maybe some of this).