Gas station without pumps

2023 February 6

Proofreading my book

Filed under: Circuits course — gasstationwithoutpumps @ 22:46
Tags: , , , ,

World Scientific Publishing sent me round four of proofs recently (round three had terrible problems with scrambled citations, and I sent the citation fixes and only a little other stuff on round 3—they still had not fixed some serious errors from round 2). I’ve been working pretty much full-time on reading the proofs, lately, trying to catch as many of the problems as I can.

Proofreading is different from copy editing—I’m not taking a single PDF file and and looking for typos, spelling errors, punctuation errors, and other minor problems, but taking two different PDF files and comparing them. I’m looking for all the differences between their version of the book and my version, then deciding for each difference whether to correct their version, correct my version, or allow them to remain different.

You would think that there would be good tools for taking differences of PDF files, but I haven’t found one (at least, not a free one). All the tools I found were designed to work a page at a time, assuming that the files were almost identical.  But World Scientific has set the book in a smaller font, with less white space, so their version of the book has about 12% fewer pages than mine. So I fell back on ancient tools like diff (originally written in 1974).

The diff program compares two text files, trying to come up with a small set of insertions and deletions that convert one file into the other. So the first task was to extract text from the PDF file. Both my son and I had written tools (using different PDF parsing packages) to extract URLs from the hot links in the PDF file, for checking all the links to the web, but rather than try to wrestle with those packages for this problem (I’d not had much luck getting clean text last time I tried), I used an off-the-shelf program, pdf2txt.py, that is available with the Anaconda distribution of python, which uses the pdfminer package. Running that program on each pdf file created a corresponding text file that had most of the text of the pdf file, though somewhat scrambled around any math formulas. I called these the “unpdf” files.

Unfortunately, running diff on the unpdf files was pretty useless, as diff considers lines to be the objects to compare, and the line breaks were totally different between the unpdf files, even in regions where the text was really functionally identical.

My first thought was to take the unpdf file and break it into one word per line, so that diff could find matching blocks of words. I spent a day doing or two doing proofreading with the words files, but it was very tedious, as diff often matched common words from different sentences, so I had to spend a lot of time deciphering whether a particular change was real or not. I used emacs with the ediff-buffers command to display the matches, but it did not really give me enough context.

I thought that using a more modern diff algorithm might help, so I wasted some time figuring out how to apply “git diff” with the histogram algorithm to pair of files. Unfortunately “git diff” also uses a more modern output format, which the emacs ediff package does not understand,  and I did not want to write yet another program to take the “git diff” output and convert it to normal diff format.

Instead, I realized that words were not the right size unit—I wanted to compare something more like blocks of sentences. So I wrote yet another Python program to take the unpdf text file and break it into sentences. More precisely, I split it into sentence-like chunks—I just merged everything into one long string then split after every sentence-ending punctuation mark that was followed by white space, and compressed all white space to a single space.

Running diff via ediff-buffers on sentences files worked fairly well, highlighting sentences that had changed and showing me what words within each sentence were different. Diff has some major problems dealing with structural variation, though—if a figure floats to a different location relative to the text, then its caption (or the text that it floated over) will get handled as a deletion in one place and an insertion in the other, without being compared for changes. There is a very similar problem in genomics, looking for structural variation between two copies of a genome, with the added complexity of having inversions possible as well as simple rearrangement, but I don’t know of a (free) text tool that works well on text rearrangements. In any case, the figure captions are relatively small, so I could hand check them when diff was unable to match them.

I could get most of the plain text to match ok, but pdf2txt.py scrambled stuff rather badly around each math formula, and the scrambling was very different in the two unpdf files, so a lot of the lines in the sentences files were different, even when the corresponding text and formulas on the page were the same. So I often had to visually inspect the differences that diff found, or ignore changes near math.

I found hundred of differences between their version and mine. Many were small copy-editing changes they had made, of which I accepted maybe half and incorporated them into my version. Many were copy-editing errors, where they had introduced a change that I regarded as unacceptable—like referring to Lab 13 consistently as Lab 42 or swapping two of the references in the reference list, so that citations for A-weighting pointed to an article on action potentials.

A few other differences were ones I could live with, but didn’t like, so I left them in their version but did not incorporate them into mine.  One of the most common was that they replaced essentially all my ellipses (…) with “etc.”, which I could live with, except when the ellipsis was at the end of a sentence (I don’t like overloading the period to be both the end of the abbreviation and the end of the sentence) or when it was in the middle of a list, rather than at the end. I think that the reason they did not like ellipses is that they set them horribly—not using the ellipsis glyph in Unicode, nor the closely spaced periods that TeX uses. Instead they had periods separated by word spaces, like some high-schooler might use.

Another change that I allowed, but did not incorporate, was a change in the format for numbering the subfigures.  I used numbers like “Figure 3.4a”, while they used “Figure 3.4(a)”.  It was clear that they had hand-edited the cross-references, though, as they often removed or broke the corresponding hot links to the figures, while the unchanged figure references still had hot links.

There were also a few capitalization and punctuation differences, where the copy editors were not so wrong that they needed to be corrected, nor so right that I felt I should match them.

The whole process took several 10-hour days to make one pass through the book. I would never have been able to do the proofreading manually without a program to point out all the little differences. I’ve still not checked the math typesetting, which I hope they did not mess up too badly. The few places I did check looked ok, except for some examples in the LaTeX tutorial sections, where they had “corrected” the examples that were supposed to be showing what happens when you don’t do it right.

Part of the reason for some of the biggest hassles I’ve had in the proof is that the publisher seems to do a lot of stuff manually that should be automated (like all the reference list and citations—if it were done with a tool like BibTeX, then there would not have been the hundreds of citation errors in round 3 of the proofs). They fixed the citation errors in round 4 (except for one swap), but I dread what they are going to do with the index—I suspect that it is going to be hand-crafted and full of errors, while the index I generated will be clean and mostly correct (I put a lot of work into annotating the LaTeX files for the index and formatting the index nicely).

I regard my PDF file as the “true” version of the book, with the World Scientific one as a slightly inferior copy, but I suspect that most people would not distinguish between them, and they will be producing paper copies to sell, which I will not be doing, so there is value in what they are doing.

With them just starting on correcting the errors, I suspect that the book will be coming out in June 2023 (rather than the original target of June 2022).  The official web page now says March 2023, but I suspect it will slip again.

Blue-cover

2022 April 16

ECG: 2-electrode vs. 3-electrode

Filed under: Circuits course — gasstationwithoutpumps @ 12:23
Tags: , , , ,

In Lower PVC frequency, I said “I did not do direct comparisons of the 2-electrode and 3-electrode configurations—I’ll have to try that sometime soon.” So I did that earlier this week, recording resting ECGs first with a 3-electrode configuration (with the bias electrode on my sternum, halfway between the LA and RA electrodes) and then with a 2-electrode configuration (with the bias wire clipped to the RA electrode).  The 60 Hz noise was slightly higher with the 2-electrode configuration, but after filtering and signal averaging the two recordings were almost identical:

resting-2022-Apr-13

The waveforms after signal averaging were remarkably similar. The PVC burden was also similar (20.1% for the 3-electrode recording and 20.5% for the 2-electrode recording).

bpm-resting-2022-Apr-13

The pulse rate from looking at time between spikes worked well for the resting recordings, but the autocorrelation method failed completely, so I did not plot it. The rapid fluctuation in heart rate within a narrow range is real, not an artifact of the algorithm—the heart beats are not perfectly periodic, and the PVCs may be making them even less periodic. The 2-electrode recording probably started a little after 400 seconds—PteroDAQ only time stamps when the file was saved, not what the t=0s time was. I should probably fix PteroDAQ to change that, recording both.

exercise-2022-Apr-13

I tried recording a session on the exercise bike also. The PVCs are mainly during resting at the beginning of the session and at the end of a recovery at the end of the session—the PVC burden was only 1.4%.

bpm-exercise-2022-Apr-13

For the exercise recording, the noise really disrupted the spike-based pulse detection, but did not interfere as much with the autocorrelation-based pulse detection. My peak pulse rate was about 151.5 bpm, by the autocorrelation measure. I’m not sure whether the sudden changes in pulse rate at 100s (when I started pedaling) and around 556s (about 130s into the recovery time) are real or not—the noise in the recording makes it a little difficult to determine the “correct” pulse rate.

The noise during exercise was not 60Hz noise and seemed to vary with whether I was inhaling or exhaling, so I think that it was probably caused by EMG signals from the pectoral muscles or perhaps the diaphragm. The spike detector was clearly missing a lot of the spikes, but making it more sensitive would probably result in false triggering on the EMG noise. I’m wondering whether putting the electrodes on my back, over the scapulae, would reduce the EMG noise, but placing those electrodes and clipping to them would be difficult without an assistant.

The autocorrelation-based pulse detection seems more reliable when exercising, as my pulse is more periodic and has few PVCs, and the autocorrelation method is less susceptible to aperiodic noise.  The spike-based pulse detection seems more reliable when resting, when the pulse is not as periodic and PVCs disrupt the pattern.

I’m also wondering whether a more strenuous exercise session would raise my pulse rate, or whether I’m getting close to my maximum heart rate.  The standard formula for maximum heart rate by age suggests that this may be close to my maximum, but the exercise does not seem all that strenuous, and a couple of years ago I could routinely push to 170 bpm (though perhaps on a device that was an unreliable reporter—it was built into a treadmill at the gym).  So sometime in the next few weeks I’ll try using a higher power output and seeing where my heartbeat tops out.  I’ll probably need to increase the cadence, rather than the resistance, as I’ve been using about 70rpm and 28Nm to get about 205W.  Raising that to 80rpm or even 90rpm is probably easier than increasing the torque.

2022 April 8

Press release for my book

Filed under: Circuits course — gasstationwithoutpumps @ 22:29
Tags: , , ,

World Scientific Publishing has released a press release for my book at https://www.eurekalert.org/news-releases/949205.  That press release is lightly edited from one I wrote for them.

The press release does not point to the lower-cost PDF version at https://leanpub.com/applied_analog_electronics, but I can’t really expect World Scientific to advertise the cheaper option, even though they did allow me to continue selling it.  They have not given me permission to use their cover design

Blue-cover

Blue cover

for my PDF sales, so I will continue to use my own design:

colorized-draft4-464x600

I’m not that fond of their all-caps typesetting anyway, and the generic “tech” background has nothing to do with the contents of the book—but if they think that generic tech backgrounds sell better, I’m willing to believe them, as they have much more experience selling books than I do.

The text for the two versions of the book should be identical, but they are resetting it (with smaller headings and some other formatting changes), so the pagination will be different.

2022 March 29

Better heartbeat detection

Filed under: Circuits course — gasstationwithoutpumps @ 14:37
Tags: , , , , ,

In Lower PVC frequency, I promised “I’ll report on the algorithms when I get something a little better than I have now.”

I’ve played with three algorithms this week:

  • Doing spike detection, then measuring the time for 2n periods by looking at the time between the spikes n before andafter the current spike.  This gives a fine resolution both in time and frequency, and provides smoothing because adjacent measurements overlap by 2(n-1) periods. It is, however, very susceptible to errors due to miscalling the spikes.  Missed spikes result in too low a frequency, and extraneous spikes result in too high a frequency. I could increase n to reduce the effect of miscounts, but at some loss of time resolution when pulse rate changed.
  • Taking FFT of a block of samples (say 4096, which is about 17 seconds at 240 Hz) and looking for a high energy frequency.  Temporal resolution is poor (even with 50% overlapping blocks we get one measurement every 8.5s) and frequency resolution also poor (bins are about 3.5 bpm wide). I tried improving the frequency resolution by looking at the phase change for the peak between adjacent windows, but that didn’t solve the main problem, which was that choosing the right peak in the spectrum was often difficult.  The simple algorithms I tried for choosing the peak often failed. I eventually gave up on this technique.
  • Taking the autocorrelation of a block of samples (using rfft and irfft) and looking for a peak. The time for that peak is the period, which can be inverted to get the frequency.  This method provides the same coarse time resolution as the FFT method (same size blocks), but has much better frequency resolution, as even the fastest reasonable pulse rate (240bpm or 4Hz) has 60 samples at a sampling rate of 240Hz. I tried accentuating peaks “of the right width” in the autocorrelation by doing some filtering of the autocorrelation, and I tried looking for harmonic errors (where 150bpm might result in a larger peak in the autocorrelation at 75bpm, 50bpm, or 37.5bpm). Even with all the tweaks I could think of, I still had a number of way-off estimates, though median filtering removed most of the anomalies.  Of course, median-of-5 filter makes the time resolution even worse, as median could have come from any of 5 windows (with 50% overlap, that means a time range of 12288 samples or 51.2 seconds!).

I did most of my algorithm testing on one data set (the exercise set from 23 March 2022), and the algorithm is almost certainly overtrained on that data.

bpm-2022-Mar-23

Here are both algorithms applied to the two data sets from 23 March. On the exercise set, the autocorrelation method did an excellent job (except right at the end of the run), but the 12-period measure clearly shows missing and extra peaks. On the resting set, the 12-period measurements were very good, but the autocorrelation ones failed at one point, even with median-of-five filtering. The autocorrelation measurements were also consistently somewhat low.

bpm-resting-2022-Mar-23

To try to figure out why the autocorrelation estimates of the pulse rate were low, I tried superimposing the filtered ECG signal on the plot. The PVCs are visible as large downward spikes. Having one or more PVCs in the window seems to make the autocorrelation estimates somewhat too low. I still have no explanation for why the autocorrelation measure fails so badly around 50 seconds.

Although the autocorrelation measure makes a nice smooth plot on the exercise data set, I sacrificed a lot of temporal resolution to get that. I think that I would do better to make a more robust spike detector to improve the period-based measurements.

2022 March 24

Lower PVC frequency

Filed under: Circuits course — gasstationwithoutpumps @ 09:58
Tags: , , ,

In PVC: Premature Ventricular Contraction, PVC and pulse, PVC again, and No PVC while exercising I posted some ECG recordings of my heart to show the premature ventricular complexes. Yesterday I tried recording the ECGs again, this time using only 2 electrodes using Lead I (LA–RA) and connecting the bias wire to right-arm electrode.  When I’ve tried that in the past it has not worked well, but with the new amplifier I got reasonably good recording, with only a bit more noise than I usually get with a 3-electrode configuration.  I did not do direct comparisons of the 2-electrode and 3-electrode configurations—I’ll have to try that sometime soon.

The interesting result is that my PVCs seem to be much less frequent now.  In a fairly long (454-second) recording, I had 355 normal spikes and 13 PVCs—a PVC burden of only 3.5%, with a pulse rate of 48.5 bpm.

Lead-I-2022-Mar-23

The resting pulses looked pretty much like other recordings I made (averaged over all the pulses.

I also made a recording while on the bicycle ergometer: 100s just sitting, 400s pedaling moderately hard, and 500s recovering.  There were very few PVCs in that recording also (only 13 detected, for a PVC burden of 0.9%).  The averaging of waveforms across the entire recording did not produce a very useful result, as the heart rate varied from around 45bpm to around 154bpm.  I probably should modify the software to produce averages over just a dozen or so beats, to get the noise reduction without the confounding effect of variation in rate.

My attempt to produce a bpm vs time graph was not very successful—the signal was noisy enough that the spike detection algorithm was occasionally missing a spike or picking up an extra one, so that the method I was using of just inverting the time for n periods resulted in a very noisy graph at the higher heart rates.  I spent most of yesterday looking at two other ways of determining the period—one using the peak in an FFT and phase changes at that peak between overlapping windows and one using peaks in autocorrelation.  None of the three techniques worked well enough to produce a smooth graph, so I’ll probably work on them some more to try to come up with a more robust pulse finder.

I’ll report on the algorithms when I get something a little better than I have now.

Next Page »

%d bloggers like this: