Gas station without pumps

2014 May 6

Sum of probabilities in log-prob space

Filed under: Uncategorized — gasstationwithoutpumps @ 22:11
Tags: , ,

One of programmers I work with asked me for a write-up or source code for a careful implementation of sum of probabilities in log-prob representation.  I lecture on the topic at least once a year, but the talks are always extemporaneous white-board talks, so I don’t have any written notes. I’m sure I’ve written it up carefully at some point in the past, but after an hour I was unable to find either well-written code or a good written description of the method, so I gave up looking and decided to redo the derivation on my blog (where, I hope, I can find it again).

The problem is mathematically very simple.  We have two non-negative numbers (usually probabilities), p and q, that are represented in the computer by floating-point numbers for their natural logs: a=\ln p and b=\ln q, and we want to represent their sum: \ln(p+q).

The brute-force way to do this in code is log(exp(a) + exp(b)), but this can run into floating-point problems (underflow, overflow, or loss of precision). It is also fairly expensive, as it involves computing three transcendental functions (2 exponentiation and 1 logarithm).

We can simplify the problem: \ln(e^{a} + e^{b}) = \ln\left( e^{b}(e^{a-b} + 1) \right) = b + \ln(e^{a-b} +1) , so all we need to compute is the function f(x) = \ln(e^{x} +1).  Furthermore, if we swap the inputs as needed, we can make sure that b\geq a, so that f(x) is only needed for x\leq 0.  This means that we only need two transcendental functions (one exponential and one logarithm), and the logarithm is always going to be of a number between 1 and 2 (which is generally where logarithm implementations are most efficient). We can get much better accuracy by eliminating the addition and using library function log1p which computes \ln(1+x). Our function f is then computed as log1p(exp(x)).

But we can still run into problems.  If x is very negative, then exp(x) could underflow to 0. This is not a serious problem, unless the programming language treats that as an exception and throws an error. We can avoid this problem by setting a threshold, below which the implementation of f just returns 0. The threshold should be set at the most negative value of x for which exp(x) still returns a normal floating-point number: that is, the natural log of the smallest normalized number, which depends on the precision. For the IEEE standard floating-point representations, I believe that the smallest normalized numbers are

representation smallest positive float approx ln
float16 2-15 -10.397207708
float32 2-127 -88.029691931
float64 2-1023 -709.089565713
float128 2-16383 -11355.8302591

By adding one test for the threshold (appropriately chosen for the precision of floating-point number used in the log-prob representation), and returning 0 from f when below the threshold, we can avoid underflowing on the computation of the exponential.

I was going to write up the choice of a slightly higher cutpoint, where we could use the Taylor expansion \ln(1+\epsilon) \approx \epsilon + O(\epsilon^2), and just return exp(x), but I believe that log1p already handles this correctly. So reasonably efficient code that does the sum of probabilities in log-prob representation looks something like this in c++:

inline float log1pexp(float x)
{   return x<-88.029691931? 0.: log1p(exp(x));
}
inline float sum_log_prob(float a, float b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}
inline double log1pexp(double x)
{   return x<-709.089565713? 0.: log1p(exp(x));
}
inline double sum_log_prob(double a, double b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}
inline long double log1pexp(long double x)
{   return x<-11355.8302591? 0.: log1p(exp(x));
}
inline long double sum_log_prob(long double a, long double b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}

A careful coder would check exactly where the exp(x) computation underflows, rather than relying on the theoretical estimates made here, as there could be implementation-dependent details that cause underflow at a higher threshold than strictly necessary.

3 Comments »

  1. This was pretty cool!Very clear.

    Comment by xykademiqz — 2014 May 12 @ 05:17 | Reply

  2. Is there any optimizations for summing n probabilities. For example using this n times would use n log and exp calls. Is it possible to avoid this?

    Comment by brugel — 2015 April 10 @ 20:05 | Reply

    • I don’t know of a simplification for n additions, since factoring out one term would still leave n-1 of them. One could probably speed things up substantially by replacing the log and exp calls by an approximation algorithm for log(1+exp(x)), perhaps using CORDIC techniques.

      Comment by gasstationwithoutpumps — 2015 April 10 @ 20:13 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: