Wednesday, June 25, 2008

VALUE: One thing to reference them all.

Common to the implementation of a lot of Lisps, Schemes, and other dynamic languages is VALUE. These usually hold a pointer to whatever is considered an object in the system. For performance reasons, some values are stored directly in the VALUE, commonly called 'immediate values'. Since pointers are 4-byte aligned on modern architectures, we know that any value with the final two bits set to 1, 2, or 3 isn't a pointer to some heap or stack allocated object, and we can use this to our advantage to store immediate values for things like integers that would be too costly to allocate. Ruby does this by left shifting integers on bit and adding one to them. Thus, value with the low bit set is an integer (Fixnum) and it's value can be found by right shifting one bit. Ruby uses a special macro for this to handle architectures that don't preserve the sign bit in right shifts. Though I don't know what architectures do this, I do know that LLVM has a instruction that specifically preserves this bit.

Also in Ruby, false, or Qfalse in the C code, is implemented as 0x00, Qtrue is 0x02, and Qnil is 0x04.

There are other ways to do this. Ages ago I worked on a system called LeLisp, which used separate memory zones for each type of object. Thus no tagging was necessary, you just rounded the address and it gave you a zone type address.

I was reading a paper the other day on implementing Scheme for microcontrollers, and it gives a good overview of the various methods of implementing VALUE.

No comments: