# Gas station without pumps

## 2014 May 6

### Sum of probabilities in log-prob space

Filed under: Uncategorized — gasstationwithoutpumps @ 22:11
Tags: , ,

One of programmers I work with asked me for a write-up or source code for a careful implementation of sum of probabilities in log-prob representation.  I lecture on the topic at least once a year, but the talks are always extemporaneous white-board talks, so I don’t have any written notes. I’m sure I’ve written it up carefully at some point in the past, but after an hour I was unable to find either well-written code or a good written description of the method, so I gave up looking and decided to redo the derivation on my blog (where, I hope, I can find it again).

The problem is mathematically very simple.  We have two non-negative numbers (usually probabilities), p and q, that are represented in the computer by floating-point numbers for their natural logs: $a=\ln p$ and $b=\ln q$, and we want to represent their sum: $\ln(p+q)$.

The brute-force way to do this in code is log(exp(a) + exp(b)), but this can run into floating-point problems (underflow, overflow, or loss of precision). It is also fairly expensive, as it involves computing three transcendental functions (2 exponentiation and 1 logarithm).

We can simplify the problem: $\ln(e^{a} + e^{b}) = \ln\left( e^{b}(e^{a-b} + 1) \right) = b + \ln(e^{a-b} +1)$, so all we need to compute is the function $f(x) = \ln(e^{x} +1)$.  Furthermore, if we swap the inputs as needed, we can make sure that $b\geq a$, so that f(x) is only needed for $x\leq 0$.  This means that we only need two transcendental functions (one exponential and one logarithm), and the logarithm is always going to be of a number between 1 and 2 (which is generally where logarithm implementations are most efficient). We can get much better accuracy by eliminating the addition and using library function log1p which computes $\ln(1+x)$. Our function f is then computed as log1p(exp(x)).

But we can still run into problems.  If x is very negative, then exp(x) could underflow to 0. This is not a serious problem, unless the programming language treats that as an exception and throws an error. We can avoid this problem by setting a threshold, below which the implementation of f just returns 0. The threshold should be set at the most negative value of x for which exp(x) still returns a normal floating-point number: that is, the natural log of the smallest normalized number, which depends on the precision. For the IEEE standard floating-point representations, I believe that the smallest normalized numbers are

representation smallest positive float approx ln
float16 2-15 -10.397207708
float32 2-127 -88.029691931
float64 2-1023 -709.089565713
float128 2-16383 -11355.8302591

By adding one test for the threshold (appropriately chosen for the precision of floating-point number used in the log-prob representation), and returning 0 from f when below the threshold, we can avoid underflowing on the computation of the exponential.

I was going to write up the choice of a slightly higher cutpoint, where we could use the Taylor expansion $\ln(1+\epsilon) \approx \epsilon + O(\epsilon^2)$, and just return exp(x), but I believe that log1p already handles this correctly. So reasonably efficient code that does the sum of probabilities in log-prob representation looks something like this in c++:

inline float log1pexp(float x)
{   return x<-88.029691931? 0.: log1p(exp(x));
}
inline float sum_log_prob(float a, float b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}
inline double log1pexp(double x)
{   return x<-709.089565713? 0.: log1p(exp(x));
}
inline double sum_log_prob(double a, double b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}
inline long double log1pexp(long double x)
{   return x<-11355.8302591? 0.: log1p(exp(x));
}
inline long double sum_log_prob(long double a, long double b)
{   return a>b? a+log1pexp(b-a):  b+log1pexp(a-b);
}


A careful coder would check exactly where the exp(x) computation underflows, rather than relying on the theoretical estimates made here, as there could be implementation-dependent details that cause underflow at a higher threshold than strictly necessary.

1. This was pretty cool!Very clear.

Comment by xykademiqz — 2014 May 12 @ 05:17

2. Is there any optimizations for summing n probabilities. For example using this n times would use n log and exp calls. Is it possible to avoid this?

Comment by brugel — 2015 April 10 @ 20:05

• I don’t know of a simplification for n additions, since factoring out one term would still leave n-1 of them. One could probably speed things up substantially by replacing the log and exp calls by an approximation algorithm for log(1+exp(x)), perhaps using CORDIC techniques.

Comment by gasstationwithoutpumps — 2015 April 10 @ 20:13

Blog at WordPress.com.