28 November 2007

Firefox fragmentation?

As Firefox 3 nears release, some of its developers are taking a close look at memory fragmentation issues. There is good information over at pavlov.net that I won't repeat here. One recurring theme though is that memory usage in version 2 was reported by some users to be problematic, and fragmentation is a suspected culprit. This has motivated an investigation of memory fragmentation before version 3 is released.

As the author of jemalloc, I have a deep (read: obsessive) interest in memory fragmentation issues, so I spent some time brushing off my malloc plotting tools today. Here is a plot from a run of firefox 2.0.0.9 running on FreeBSD-current. In order to generate the allocation trace, I launched firefox, then went through several cycles of opening lots of tabs/windows and then closing most of them.
Time starts at the left, and execution ends at the right. Each vertical column of pixels represents a snapshot of memory usage at a particular moment during program execution (time is measured by allocation events). Since there are millions of allocation events, most snapshots are left out to make the plot size manageable. Similarly, there are many bytes of memory that must be represented by each vertical column of pixels, so each pixel represents a bucket of 256kB. Low addresses are at the bottom of the plot.

Note the peaks that are mostly green. Those occur during peak memory usage periods, and overall, the plot shows that fragmentation isn't bad. Take this with a grain of salt though, since the plot only represents perhaps 15 minutes of heavy web browsing.

If you want to see much more detail (each bucket is 4kB -- one page), take a look at this image. It is big enough to cause most image viewers to choke, so beware.

10 November 2007

Fixed-precision (n choose k) and overflow

I recently found myself needing to compute (n choose k) with 64-bit integers. Recall that (n choose k) is equal to n!/[k!(n-k)!]. Mathematically, this is not a difficult computation, but when considered in the context of integer overflow, the problem becomes much harder.

To illustrate the problem, consider the computation of (9 choose 4) using 8-bit signed integers. We can start off by doing some straightforward cancellation, which leaves us with [9*8*7*6]/[4*3*2]. Where do we go from here though? If we multiply all of the terms in the numerator first, we get an intermediate result of [3024]/[4*3*2], which clearly does not fit in the [-128..127] range. The method that we are taught on paper is to cancel factors until none are left in the denominator, then multiply the remaining factors in the numerator to get the answer. We can write a program that effectively does the same thing, but do we really have to create vectors of terms and duplicate the hand method?

I searched high and low for information about how best to implement (n choose k) with fixed precision integers, without success. While considering the mechanics of coding the hand method, I realized that computing greatest common divisors (GCDs) would be a critical component. I then began to wonder if there might be an iterative algorithm that does not require manipulating vectors of integers. Here is what I came up with. (n choose k) is [n*(n-1)*...*(n-k+1)] / [k*(k-1)*...*1]. Let us call the vectors of terms in the numerator and denominator [C] and [D], respectively, so (n choose k) is [C]/[D].
  1. If (k > n/2), set k <-- n-k. This does not change the result, but it reduces the computational overhead for later steps.
  2. Initialize accumulators A and B for the numerator and denominator of the result to 1, so that A/B is 1/1. Note that upon completion, B will always be 1, thus leaving the result in A, but during computation, B may be greater than 1.
  3. While possible without overflowing A (and while [C] is non-empty), repeatedly merge the first term of [C] (call it c) into A and remove the term from [C]. This is achieved via the following steps:
    1. Divide g <-- GCD(c, B).
    2. B <-- B/g and c <-- c/g. This removes common factors.
    3. A <-- A*c.
  4. While possible without overflowing B (and while [D] is non-empty), repeatedly merge the first term of [D] into B and remove the term from [D]. This is achieved using the same algorithm as for step 3.
  5. If no progress was made in steps 3 or 4, fail due to unavoidable overflow.
  6. If [C] or [D] is non-empty, go back to step 3.
Here is a reference implementation in C.

Since implementing the algorithm, I have been troubled by a seemingly simple question: does this algorithm ever fail even though the final result can be expressed without overflow? My intuition is that the algorithm always succeeds, but a proof has thus far eluded me. I have exhaustively tested the algorithm for 32-bit integers, and the algorithm never fails. Unfortunately, I really need to move on to other work, since the algorithm certainly works well enough for my needs.