Gas station without pumps

2014 August 19

Two papers submitted

I got two papers with a co-author finished and submitted today.

One of them is based on the segmenter algorithms that I wrote up in this blog (with some improved parameterization): Segmenting noisy signals from nanopores, Segmenting filtered signals, and Segmenting noisy signals revisited.

The other one is an HMM paper that uses the automatic segmentation and an HMM that models the DNA motor behavior to do the same analysis that was done by hand in an earlier paper “Error Rates for Nanopore Discrimination Among Cytosine, Methylcytosine, and Hydroxymethylcytosine Along Individual DNA Strands” in PNAS.  The automatic HMM method was more accurate than the hand curation and feature extraction followed by sophisticated machine learning methods like support vector machines and random forests—I think that we were limited before by the biases of the hand curation and feature extraction.

Unfortunately, we were not able to do side-by-side comparison on exactly the same data, as the original data for the PNAS paper was lost in a 2-drive failure on a RAID that had not been properly backed up.

The paper writing did not take much of my time this summer, as my co-author did a lot of the writing and kept me from procrastinating too much.

I’ve also been working on my book, tentatively titled Applied Electronics for Bioengineers, but without a co-author to keep me honest, I’ve been doing a lot of procrastinating.  Yesterday I had planned to show the waveform of the gate signal on a nFET being used for pulse-width modulation, to show the “flat spot” that results from feedback through the gate-drain capacitance.  I fussed with that for quite a while, but never got a recording I liked the looks of—so ended up getting nothing written on the book.  I’ll probably try again tomorrow, but with a backup plan of working on a different section if that one gets too frustrating, or of writing some of the text that will go around the figure, once I get a figure I can use.

The book is currently 215 pages, but that includes the front matter and back matter.  The core of the book is only about 188 pages, with 74 figures.  I probably need a bunch more figures (schematics, graphs, photos, …), but those are what take the longest to produce.  There’s a little over a month left until classes start, but I don’t see myself finishing by then.

2010 November 26

Lagrangian Multipliers and HMMs

Filed under: Uncategorized — gasstationwithoutpumps @ 00:00
Tags: , , , ,

Where in an undergraduate STEM major is the technique of Lagrangian multipliers for solving constrained optimization problems taught?

Doing a search for it on my campus found a few classes, all of which are upper-division or graduate classes: MATH 140 (Industrial Math), Econ 100 A(Intermediate Microeconomics), Econ 205C (Advanced Macroeconomic Theory III), ISM 206 (Optimization Theory and Applications), Physics 105(Classical Mechanics), CMPS 242 (Machine Learning), CMPS 290C (Advanced Topics in Machine Learning), BME 205 (Bioinformatics: Models and Algorithms). Students in my class had a vague recollection of having heard of Lagrangian multipliers, but had never actually used them (except for a couple of exchange students from Europe).

My personal belief is that such a fundamental, useful technique should be included in the lower-division multi-variable differential calculus class. Now, this isn’t a curmudgeonly complaint about how students nowadays don’t learn things the way we did, back in the good old days of slide rules and tables of logarithms (and, yes, I am just barely old enough to have learned with those tools). I didn’t learn about Lagrangian multipliers until I was a professor (having completed a B.S. and M.S. in math and a Ph.D. in computer science).  I was introduced to the method by an engineering grad student (from India, if I remember correctly), who had had a much more practical math education than mine.  In fact, most of my practical math education came either from my father or was self-taught.  Very little of my math education in school or college turned out to have much lasting value to me, other than allowing me the confidence to learn on my own the math I actually needed.

For those of you completely unfamiliar with Lagrangian multipliers, I recommend the Wikipedia page, Lagrangian multipliers. For bioinformatics students, I will try to present one example, using the technique to derive the Baum-Welch algorithm, which is the expectation-maximization algorithm used for training hidden Markov models (HMMs). These are the notes for one lecture in my graduate class in bioinformatics, coming after we have developed the forward and backward algorithms for HMMs. For those who want to learn this stuff on their own, I recommend Chapter 3 of Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison, though I don’t remember them using Lagrangian multipliers.

[I’m presenting these notes in my blog for two reasons: 1) I wanted to plead for more teachers to teach Lagrangian multipliers, and 2) my presentation of it in class sucked.  In my defense, I had only had four hours of sleep the night before, and live-action math requires more alertness than I had with so much sleep debt.]

Our goal is to choose the parameters of the hidden Markov model to maximize the probability of the HMM generating a particular observed set of data.  The EM (Expectation Maximization) algorithm is an iterative approximation algorithm.  It starts with a setting of the parameters, then chooses a new setting of parameters that increases the probability, and repeats.  What we are working on here is doing a single step: given a setting of the parameters, how do we find a better setting?

The parameters are e_S(c), the probability of emitting character c in state S, and a_{ST}, the probability of making a transition to state T from state S (I’m using the notation of the book here, though it is awkward to mix subscript and function notation, and would have been cleaner to use e_{Sc}). It is common to refer to the parameters as a vector \Theta, so that we can concisely refer to functions of $\latex \Theta$. We have the constraints that \sum_c e_S(c) = 1 and $\latex \sum_T a_{ST} = 1$, that is that emission probabilities for a state S sum to one, and so do the transition probabilities to a new state.  (There are also constraints that all probabilities are non-negative, but those constraints don’t need to be explicitly represented in solving the problem, as the solution we’ll come up with automatically satisfies that.)

The probability of a sequence X_1, \ldots , X_n is the sum over all paths of length n through the HMM of the probability of taking that path and emitting X, that is P_\Theta(X) = \sum_{\pi_1, \ldots, \pi_n} \prod_{i=1}^n a_{\pi_{i-1}\pi{i}} e_{\pi_i}(X_i). The products are  a bit messy to take derivatives of, because some of the parameters may be repeated,  but we can simplify them by grouping together repeated factors:

P_\Theta(X) = \sum_{\pi_1, \ldots, \pi_n} \prod_{c,S,T} a_{ST}^{n_{ST}} (e_{T}(c))^{m_{Tc}},

where n_{ST} is the number of times \pi_{i-1}=S and \pi_{i}=T along a particular path and m_{Tc} is the number of times \pi_{i}=T and X_i=c.  Both of these new count variables are specific to a particular path, so should really have the \pi vector as subscripts, but we’ve got too many subscripts already.

If we try to maximize this function without paying attention to the constraints, we find that all the parameters go to infinity—making the parameters bigger makes the product bigger. To keep the constraints relevant, we use Lagrangian multipliers, adding one new variable for each constraint.  I’ll use \lambda_S for the constraints on the emission table of state S and \lambda^\prime_S for the constraints on the transitions out of state S.  The objective function to take derivatives of is thus

F(X,\Theta) = \sum_{\pi_1, \ldots, \pi_n} \prod_{c,S,T} a_{ST}^{n_{ST}} (e_{T}(c))^{m_{Tc}} + \sum_S \lambda_S (1-\sum_c e_S(c)) + \sum_S \lambda^\prime_S(1-\sum_T a_{ST}) ~.

Let’s look at the partial derivatives of our objective function with respect to the emission parameters:

\frac{\partial F(X,\Theta)}{\partial e_R(d)} = \sum_{\pi_1, \ldots, \pi_n} \frac{m_{Rd}}{e_R(d)} \prod_{c,S,T} a_{ST}^{n_{ST}} (e_T(c))^{m_{Tc}} - \lambda_R~.

If the partials are set to zero, we get

\lambda_R e_R(d) = \sum_{\pi_1, \ldots, \pi_n} m_{Rd} \prod_{c,S,T} a_{ST}^{n_{ST}} (e_{T}(c))^{m_{Tc}}~.

The sum over all paths formula is the expected count of the number of times that state R produces character $d$. Note that this happens only  for positions i where X_i=d, so we can rewrite it as the sum over all such positions of the probability of being in state R at that position, which is precisely what the forward-backward algorithm computed for us:

\lambda_R e_R(d) = \sum_{i: X_i=d} f(i,R)b(i,R)~.

We can now impose the constraint \sum_d e_R(d) = 1, by adding all these equations:

\lambda_R= \sum_{d} \sum_{i: X_i=d} f(i,R) b(i,R) ~.

Finally we get the new value of the emission parameters:

e_R(d) = \frac{\sum_{i: X_i=d} f(i,R)b(i,R)}{\sum_{d} \sum_{i: X_i=d} f(i,R)b(i,R)} ~.

We can do a very similar series of operations to get update equations for the transition parameters also:

a_{ST} = \frac{\sum_{i} f(i,S)a_{ST}e_T(X_i)b(i+1,T)}{\sum_{R} \sum_{i} f(i,S)a_{SR}e_R(X_i)b(i+1,R)}~.

Note that the optimization we did uses the parameters in the forward-backward algorithm, so we haven’t really found “the optimum”, just a better setting of the parameters, so we need to iterate the whole process, recomputing the forward-backward matrices after resetting all the parameters.

%d bloggers like this: