Gas station without pumps

2014 February 1

More on segmenting noisy signals

Filed under: Uncategorized — gasstationwithoutpumps @ 10:34
Tags: , ,

Last August, I posted on Segmenting noisy signals from nanopores, but I’ve always been a bit unsatisfied with the parameterization of that method.  The basic idea is to have a parameterized probabilistic model for a segment, Pθ(s:t), where θ is some parameter setting for the model optimized for the half-open time interval [s,t).  We compare the probability of the segment modeled as two subsegments with the segment modeled as a single unit:

L_{i}=\log \frac{P_{\theta_{1}}(s:i) P_{\theta_{2}}(i:t)}{P_{\theta_{0}}(s:e)} = \log P_{\theta_{1}}(s:i) + \log P_{\theta_{2}}(i:t) - \log P_{\theta_{0}}(s:t),

where i is the breakpoint, θ1 and θ2 are the parameters for the two segments, and θ0 is the parameter setting for the whole segment.  I showed in the previous post how to simplify this for stepwise Gaussian models to L_{i} = (s-i)\ln \sigma_{1} + (i-t)\ln \sigma_{2} - (s-t)\ln \sigma_{0}.  Note that translation or scaling of the samples will not affect this log-likelihood, as the means have already cancelled, and scaling adds a constant to all the \ln\sigma values, and that constant also cancels.

To find the best breakpoint in an interval, we check each possible breakpoint, and pick the one that maximizes Li. We then decide whether that Li is big enough to keep the breakpoint or not using a user-settable threshold. If the threshold is set very low, then the user is saying that there is not much noise and we should follow the signal closely. If the threshold is set high, then the user is saying that there is a lot of noise and we need to be very confident of any breakpoint we claim.

What has been bothering me is how we’ve been setting the threshold.  I’d really like to have an easily interpreted parameter that we can set independent of sampling rate or other confounding factors.

I started out by looking at how L_{i} is distributed for sequences that are generated from a single Gaussian model (with no step).  I did a bunch of simulation experiments, and I was surprised to find that the probability density function appears to be a simple exponential distribution Pr(L_{i} \geq x ) = e^{-x} for x\geq 0, and that this distribution does not depend on how big a sequence is being split.

That lack of dependence on how big the sequence is means that I don’t have to adjust the threshold based on how big a window I’m looking at. I had originally thought that there ought to be such an adjustment, but I got better results by keeping the threshold fixed.

If the empirically observed exponential distribution of L_{i} is correct, then we can parameterize the threshold in terms of a false-positives-per-second parameter, F. If we divide by the sampling frequency, we get a false-positive-per-sample F/f_{s}. The threshold T needs to be set so that Pr(L_{i} \geq T ) = e^{-T} \leq F/f_{s}, or T\geq -\ln(F/f_{s}).

But specifying a tolerance for false positives is not good enough, as one generally wants about the same false positive rate independent of how fast the signal is changing, but the threshold should be tighter if you expect the signal to remain constant for a long time, and looser if you want to track the signal changes rapidly.  I think that this can be handled by not using the likelihood ratio, but the posterior odds ratio, in a properly Bayesian way.  The user of the segmenter expresses how finely they want things divided in terms of expected steps per second, sps, as well as an acceptable overcalling rate, over, (say 1% of segments getting split), which sets the false-positive rate F=sps*over.

The prior probability of a step at each sample is \mbox{\it sps}/f_{s}, so the prior odds ratio is \frac{ \mbox{\it sps} }{f_{s}-\mbox{\it sps}}, and the posterior odds ratio is L_{i} \frac{ \mbox{\it sps} }{f_{s}-\mbox{\it sps}}.  The threshold on likelihood would then be set to T=-\ln \frac{ \mbox{\it sps} }{f_{s}-\mbox{\it sps}}-\ln(F/f_{s})\approx -2\ln(\mbox{\it sps}) -\ln(\mbox{\it over})+2\ln(f_{s}).

On the sample event that I’ve been using for testing, setting the segments/second to 10, and the oversegmentation rate to 0.01 gets me pretty much the segmentation I want to see, with an average segment length of 68.79ms (14.5 segments per second).  Adjusting the prior segments/second to 1.0 gets me 11.6 segments per second, and to 100.0 gets me 18.4 segments/second, so the prior is having about the right effect on the segmentation, adjusting the rate of segmentation without overwhelming the data.  Since the underlying segmenter is the same as before (only the parameters used for setting the threshold have changed),  it may still be difficult to find settings that work for some nanopore experiments.

2013 August 10

Segmenting noisy signals from nanopores

Filed under: Uncategorized — gasstationwithoutpumps @ 10:05
Tags: , ,

Today’s post is not about what or how I teach, but about some of the research I’ve been doing lately.  I’ve been collaborating with another faculty member and his students on analyzing nanopore data.

The basic idea of a nanopore is simple: you have a thin membrane separating two salt-water baths, and stick a very tiny hole in it.  You put a voltage difference between the two baths and measure the current through the hole.  If you can arrange for large molecules to pass through the hole, they block some of the current flow, and you can monitor the passage of the molecules through the holes by monitoring the current.

In my collaborator’s lab, they use a lipid bilayer (similar to a cell membrane) as the separating membrane, and a porin protein to punch a hole in the membrane.  (They used to use α-hemolysin as the nanopore, but are now trying MSP-A, which has some better properties.)  Mostly they pass DNA through the pore, since funding is most available for DNA sequencing technologies, but they’ve also explored other large biomolecules.

Side view of the MSPA pore, from PDB file 1UUN

Side view of the MSPA pore, from PDB file 1UUN

Top view of the MSPA pore, showing the 13.8Å diameter hole that it makes in a membrane.

Top view of the MSPA pore, showing the 13.8Å diameter hole that it makes in a membrane.

One problem with nanopores, as with almost all single-molecule techniques, is that the signals are really small, nearly buried in noise (see my post Noise in nanopores).  We are often looking for changes of less than a pA (that’s 10–12 amps) with noise that may be 2pA RMS or more.

When things are working the way we want, the large molecule moves stepwise through the pore. For example, with DNA in the pore, we want to attach some sort of DNA motor to pull the DNA through. But we don’t want a smooth, continuous motion—we want a motor that advances the DNA by one nucleotide rapidly, then stays there for a long time so we can have a constant current to identify what is in the pore, then advances again by one and stays there, and so forth. Luckily for us, many DNA motors do indeed demonstrate this sort of stepwise motion.

One of the subproblems I’ve been helping students with for the past several months is trying to recover a clean stepwise signal from the noisy data. The problem is to summarize the continuous-time current trace by a small number of “segments”, each of which can be summarized by the mean, standard deviation, and duration of the segment.

There were a couple of really bright students working on it, and they had reasonable success, but the methods always seemed a bit fragile—a new set of data taken under different conditions required a lot of tweaking to get the method to segment the data adequately, and sometimes we were unable to find parameters that would work for all parts of the trace—particularly when some parts had higher noise levels than others, or much shorter durations for the fragments.

Since the data is not mine to share, I made up some fake data that looks sort of like a nanopore trace (but cleaner—no 60Hz noise, no baseline drift, just clean steps between Gaussian distributions):

This is a fake nanopore trace that has been segmented by my program.  The light green trace is the raw data, the red vertical lines are the breakpoints and the dark blue lines are the means for each segment.   The noise levels for each segment are realistic, as are the durations and separations between the steps.  The segmentation is almost perfect (there are a couple of very tiny extra segments added).

This is a fake nanopore trace that has been segmented by my program. The light green trace is the raw data, the red vertical lines are the breakpoints and the dark blue lines are the means for each segment. The noise levels for each segment are fairly realistic, as are the durations and separations between the steps. The segmentation is almost perfect (there are a couple of very tiny extra segments added).

The method we were using was a fairly straightforward one: use a low-pass filter to remove noise, then look for large changes in the filtered signal to determine that a breakpoint had occurred, and look for a peak in the derivative of the filtered signal to place the breakpoint precisely. This is a fairly standard technique, and it works well on some signals. But it wasn’t working well for us on some of our data, because some of the time the signal changes very slowly, and sometimes it changes very fast. Tweaking the filter and threshold parameters helped, but when we got adequate segmentation on one part of the signal we either over-segmented some parts or missed segments (or both). We were coming up with more and more baroque schemes to post-process the segments, and still not getting something that was clean enough to pass on to the machine-learning algorithms that were supposed to do the actual analysis we needed.

Finally this week I did what I should have done months earlier—I spent some time trying to figure out what had been published in the past. Segmentation is not a new problem, and it has gone by many names—figuring out what names to look for it under was a big part of the problem. Finally I hit on “change-point detection” and found a free PDF textbook: Detection of Abrupt Changes: Theory and Application by Michèle Basseville and Igor V. Nikiforov.  I only read the first two chapters of the book, when I realized that everything we had done so far was a variant of the “filtered derivative” technique, and that there was a large class of techniques that we hadn’t tried.  The other techniques are based on maximum-likelihood estimates of where the breakpoints are, which is a comfortable framework for me to work in.  The authors were talking mainly about “online” methods (which detect the change after only a small delay) and for the first chapters at least were talking only about stepwise changes in mean with a constant standard deviation.

I grabbed onto one idea fairly quickly—trying to determine whether there was a step in a given time interval and, if so, where it occurred. (This is Section 2.6 of their book: Off-line Change Detection.)  The idea is to have a parameterized probabilistic model for a segment, Pθ(s:t), where θ is some parameter setting for the model optimized for the half-open time interval [s,t).  We compare the probability of the segment modeled as two subsegments with the segment modeled as a single unit:

L_i=\log \frac{P_{\theta_{1}}(s:i) P_{\theta_{2}}(i:t)}{P_{\theta_{0}}(s:e)} = \log P_{\theta_{1}}(s:i) + \log P_{\theta_{2}}(i:t) - \log P_{\theta_{0}}(s:t),

where i is the breakpoint, θ1 and θ2 are the parameters for the two segments, and θ0 is the parameter setting for the whole segment

For a stepwise model, the probabilistic model can be just an independent, identically distributed model for the samples in the segment, with each sample chosen from a Gaussian distribution. The parameters for each model are then just the mean μ and standard deviation σ for the Gaussian, and we can get the maximum-likelihood estimates easily from the samples in the segment.  (In the implementation, I actually keep cumulative sums of x and x2 in order to compute mean and variance quickly for any interval.) Note that with this choice of model, the log-likelihood ratio Li is always non-negative if we choose maximum-likelihood estimates for each segment, since we could get 0 if we chose (possibly non-optimal) parameters on top θ120.  That means that we can always model the data more closely if we add more breakpoints.

The probability of a segment in this model just the product of the probabilities of the samples in the segment, and the log probability of the segment is the sum of the log probabilities of the samples.  So what is the log probability of a single sample from a Gaussian distribution?

\ln \left( \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \right) = -0.5 \ln(2\pi) -\ln(\sigma) - \frac{(x-\mu)^2}{2\sigma^2}

Now comes the magic.

When we chose the optimal μ and σ for a segment, we did it so that σ2 is the variance, that is the average value of (x-\mu)^2. So when we sum the log probabilities over a segment of length n, using μ and σ optimized for that segment, we get n( -0.5 \ln(2\pi) -\ln(\sigma) - 0.5).

That means that the log probability ratio we want simplifies to L_i = (s-i)\ln \sigma_{1} + (i-t)\ln \sigma_{2} - (s-t)\ln \sigma_{0}.

To find the best breakpoint in an interval, we check each possible breakpoint, and pick the one that maximizes Li. We then decide whether that Li is big enough to keep the breakpoint or not using a user-settable threshold. If the threshold is set very low, then the user is saying that there is not much noise and we should follow the signal closely. If the threshold is set high, then the user is saying that there is a lot of noise and we need to be very confident of any breakpoint we claim.

With this simple method for finding a single breakpoint, I wrote an algorithm for recovering a series of steps, with three parameters: the threshold for keeping a breakpoint, a minimum segment duration m, and a window width w.

Here’s how it works for segment(s:t)

  1. I look at a window starting at the beginning of the data (s:s+w), and see if there is a keepable breakpoint in that window (only examining possible breakpoints in (s+m:s+w–m) to avoid creating a too-short segment).
  2. If there is a keepable breakpoint at i, I recursively look at the data on either side, returning segment(s:i) , i, segment(i:t).
  3. If there is no keepable breakpoint I advance the window halfway and try again.  I do this iteratively, but it is equivalent to returning segment(s+w/2:t).

In the current implementation, I have another parameter for the maximum segment duration, and I force in breakpoints if I haven’t seen a keepable one for a long enough interval, but I think that was an unnecessary complication, and I’ll probably remove it from an optimized implementation.

One cool thing about this stepwise segmenter is that it does not need to have any change in the mean values of the segments—a change in the variance suffices.  I made fake data which consisted of 1-second long (10,000 samples) of Gaussian distributions, with each successive second having a standard deviation that was 90% of the previous second.  With a little tweaking to get the right range of thresholds for keeping breakpoints, the algorithm had no trouble segmenting the data:

No change in means is needed for segmenting data.  A 10% change in standard deviation is enough (if the segments are long enough).

No change in means is needed for segmenting data. A 10% change in standard deviation is enough (if the segments are long enough).  The boundaries are hard to find by eye, but the algorithm has no trouble nailing them to within a few hundred samples (tens of milliseconds).

Of course, finding real segments is nice, but what if the standard deviation changes smoothly? Do we still end up segmenting the data?

Yes, even if the standard deviation changes smoothly, the algorithm will still break the trace into segments.  The exact placement of the breakpoints is a bit random, but the average length of the segments depends on the threshold that is chosen.

Yes, even if the standard deviation changes smoothly, the algorithm will still break the trace into segments. The exact placement of the breakpoints is a bit random, but the average length of the segments depends on the threshold that is chosen.  So we can’t claim that because the algorithm has chosen a breakpoint that there is really an abrupt change in characteristics there—there may just have been enough accumulated change to say that what is after the break is not the same as what was before the break.

My algorithm is not quite an online algorithm, since it re-examines data after setting a breakpoint, but it never backtracks further than the window width, so there is limited delay in the decision making. I’ve found it useful to use fairly wide windows (about 30,000 data points or 3 seconds at 10kHz), even when the steps in the data are pretty short (3–100 msec).

Having to choose a threshold for when to insert a breakpoint is a bit of a nuisance, and much of Detection of Abrupt Changes: Theory and Application seems to be dedicated to various schemes for setting thresholds in robust ways that depend on the lengths of the steps, but there is always a parameter that has to be set by the user to indicate what level of detail is the interesting signal, so I’m not sure that there is a lot to gain for our purposes in the fancier schemes.

The magic that simplifies the log-likelihood function does not just apply to stepwise models.  Any model that can be represented as D(t) + \epsilon(0,\sigma), a deterministic function of t plus an independent Gaussian noise term with 0 mean, will have the same property.  I did another implementation of the segmenter that does linear regression on each segment, producing segments that have slanting line segments (not piecewise linear, since there is no continuity at segment boundaries).  This may turn out to be useful for current traces for some molecules, which do not seem to have a stepwise pattern, but ramping up or ramping down of the current. Here is an example of a fake trace made with sloping segments, reconstructed by the stepwise algorithm and the slanted-steps one:

Fake current trace with slanted segments requires a lot of steps to model the data as a stepwise signal.

Fake current trace with slanted segments requires a lot of steps to model the data as a stepwise signal.

Starting from the same deterministic underlying model (but different random noise), modeling the signal with stepwise slanted line segments does not need nearly as many breakpoints.

Starting from the same deterministic underlying model (but different random noise), modeling the signal with stepwise slanted line segments does not need nearly as many breakpoints.  All the reconstructed breakpoints here are within 3 samples (0.3msec) of the breakpoints in the original function, mainly because the segments were all separated by big steps.

The slanted-steps algorithm is slightly slower to compute than the stepwise algorithm, but even with my 4-year-old laptop computer I can do the computations nearly in real time for 10kHz signals in unoptimized Python.  Rewriting in C should run at least 10 times faster than the current Python implementation, and may be able to go 100 times faster with careful coding.  Of course, we have to choose whether the stepwise or slanted steps function are more useful representations of the underlying data.  If we really have a stepwise signal, the slanted steps could merge adjacent steps into a slope and obscure the signal.

2013 June 2

Nanopore capacitance

Filed under: Uncategorized — gasstationwithoutpumps @ 20:48
Tags: , ,

I recently wrote about the noise in nanopores, as a thinking-out-loud exercise to clarify for myself what the noise sources were and what the consequences of changing the concentration of salt ions would be. Today I did another, smaller exercise, thinking about how much capacitance the membrane of a nanopore device has and whether it would have any effect on the measurements.

I was going to try to use the standard parallel-plate capacitance formula, using the area, the separation, and the dielectric constant. The nanopore devices used at UCSC have a lipid bilayer closing a 1 µm hole between the two wells of the device, giving a plate area of about 785E-15 m2, but I don’t know the thickness nor the dielectric constant for the lipid bilayer, both of which undoubtedly depend on what lipid is used.

Since I’m mainly interested in a ballpark estimate, I looked up the capacitance of lipid bilayers on the web and found

O Alvarez and R Latorre
Voltage-dependent capacitance in lipid bilayers made from monolayers.
Biophys J. 1978 January; 21(1): 1–17.
PMCID: PMC1473368

which gives a measurement of capacitance (at 0 volts) of 0.65 to 0.81 µF/(cm)2 for various lipids.  For the nanopore, that would give a capacitance of 5–6fF, far less than the capacitance of the wires connecting to the electrodes in the wells, which are probably in the pF range.  So for practical purposes, we can mostly ignore the capacitance of the lipid bilayer, and regard the nanopore as a variable resistor, with a resistance of about 10GΩ.

2013 April 21

Noise in nanopores

Filed under: Uncategorized — gasstationwithoutpumps @ 16:32
Tags: , , , ,

I’ve been trying to understand the sources for noise in current measurements through nanopores, so that I can better understand the signals that are generated in the nanopore lab.  I’ve not studied noise in small signals before, so I’ve been having to rely on Wikipedia for information about sources of noise.  I’ve probably missed some important ones and would appreciate those more knowledgeable pointing out important noise sources that I missed.

I think that there are two main noise sources:  the nanopore itself and the resistors used in converting the nanopore current to a voltage in the first stage of amplification.  Each of these noise sources has two types of white noise: thermal noise (dependent on temperature) and shot noise (independent of temperature).

Thermal noise

The RMS current for thermal noise is i_{\mbox{\scriptsize thermal}} = \sqrt{\frac{4k_B T \Delta f}{R}} for a resistance of R at temperature T, with kB being Boltzmann’s constant, 1.3896593E-23 J/°K, and \Delta f being the bandwidth of the filter looking at the noise.  Normally, nanopore scientists don’t report the resistance of the nanopore, but the bias voltage and the DC current through the pore, but we can use Ohm’s law to rewrite the noise formula as i_{\mbox{\scriptsize thermal}} = \sqrt{\frac{4k_B T I \Delta f}{V}}.  A typical setup for a nanopore may have a voltage of 180mV and 60pA (for an open channel in 0.3M KCl), and a temperature of 25°C = 298.15°K, for 4 k_B T = \mbox{16.573E-21 J}, and noise of \mbox{2.35E-15} \sqrt{\Delta f} A/\sqrt{\mbox{Hz}}.

Shot noise

The RMS current for shot noise is i_{\mbox{\scriptsize shot}} = \sqrt{2 q I \Delta f}, where q is the magnitude of the charge of the carriers (here 1.60217646E-19 C, since the charge carriers are K+ and Cl), and I is the DC current.  Again \Delta f is the bandwidth of the filter looking at the noise.  For a current of 60pA, the shot noise would be \mbox{3.259E-15} \sqrt{\Delta f} A/\sqrt{\mbox{Hz}}, slightly more than the thermal noise.

Combined noise in nanopore

The combined noise in the nanopore from both shot noise and thermal noise should then be i_{\mbox{\scriptsize pore}} = \sqrt{\left(2 q + \frac{4k_B T}{V}\right)I \Delta f}, which for 180mV and 60pA would be \mbox{4.02E-15} \sqrt{\Delta f} A/\sqrt{\mbox{Hz}}.  If a 1kHz low-pass filter is used, that makes an RMS noise level of 0.127pA, and with a 10kHz low-pass filter, 0.4pA.

Increasing the ionic concentration would provide a larger DC current (current proportional to concentration), and the noise current only grows with the square root of the DC current, so the signal to noise ratio also grows with the square root of the ionic concentration.

Amplifier noise

These noise levels are lower than what is actually observed, of course, because we haven’t taken into account noise generated in the first stage of amplification.

In the UCSC nanopore lab, they use an Axon Axopatch 200B Capacitor Feedback Patch Clamp Amplifier, which costs about $2000 used (if you have to ask the new price, you can’t afford it—you can get a quote, but there is no list price).  The web site claims “By introducing active cooling of components in the headstage to well below 0°C, the open-circuit noise in patch mode has been reduced to unprecedented levels, as low as <15 fA (RMS) at 0.1–1 kHz.”  But the lab uses a resistive headstage (so as to be able to make DC current measurements), with a 500MΩ feedback resistor in the current-to-voltage converter, which introduces thermal noise of about \mbox{5.76E-15} \sqrt{\Delta f} A/\sqrt{\mbox{Hz}}, larger than the nanopore noise. The shot noise for the resistor in the I-to-V converter should be the same as as the shot noise for the nanopore, since they have the same DC current. Combining the thermal and shot noises for both the nanopore and the resistor at 180mV and 60pA, I estimate \mbox{7.74E-15} \sqrt{\Delta f} A/\sqrt{\mbox{Hz}}, which would be 0.244pA at 1kHz, 0.547pA at 5kHz, and 0.774pA at 10KHz.  The noise levels observed in the signals are close to those levels, so there is not much that can be done to improve the amplifiers to get a better signal-to-noise ratio.

We could be interested in lower-cost amplifiers than the $2k lab instrument, but a very low noise off-the-shelf op amp (like the AD8432 ) has a noise level of about 2.0pA\sqrt{\Delta f}/\sqrt{\mbox{Hz}}, which is much larger than what is observed in the lab.  Designs have been done at UCSC for much lower noise amplifiers (Gang Wang; Dunbar, W.B., “An integrated, low noise patch-clamp amplifier for biological nanopore applications,” Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE , vol., no., pp.2718,2721, Aug. 31 2010-Sept. 4 2010 doi: 10.1109/IEMBS.2010.5626570) claiming “input referred noise of the amplifier is 0.49 pA RMS over a 5 kHz bandwidth”, from simulation (I don’t know whether they have fabricated and tested the amp yet).

UPDATE 2013 Nov 13:  A student showed me a data sheet for an off-the-shelf ($3–$4) op amp that has low enough noise for this purpose: the AD8625 (or AD8626 or AD8627).  They have only a 0.25pA bias current and a current noise density of 0.4 fA\sqrt{\Delta}/\sqrt{\mbox{Hz}}. That’s actually better than the AxoPatch 200B, whose manual reports about 0.1 pA at 10kHz, which is 1fA \sqrt{\Delta}/\sqrt{\mbox{Hz}}.   Even better specs are available for  the AD549L (about $40) with a 60fA bias current and a current noise density of 0.11 fA\sqrt{\Delta}/\sqrt{\mbox{Hz}}.   Of course, once the amplifiers are this good, most of the noise is coming from thermal noise in the feedback resistor (about 6 fA\sqrt{\Delta}/\sqrt{\mbox{Hz}}), unless one uses capacitive feedback (which precludes measuring DC currents).

I don’t know why I missed these op amp before, as they have been around since 2002 or 2003.

2012 March 19

Nanopore sequencing

Filed under: Uncategorized — gasstationwithoutpumps @ 22:18
Tags: , ,

The big news last month was the announcement by Oxford Nanopore that they would have a commercial version of nanopore sequencing of DNA out this year:

Santa Cruz Sentinel: UK-based company unfurls new technology to sequence whole strands of DNA.

Omics! Omics!: Oxford Nanopore Doesn’t Disappoint

Omics! Omics!: Why Oxford Nanopore Needs to Release Some Data Pronto (Besides Bailing Me Out)

Oxford Nanopore’s press release makes some pretty strong claims:

Each GridION node and cartridge is initially designed to deliver tens of Gb of sequence data per 24 hour period, with the user choosing whether to run for minutes or days according to the experiment.

Oxford Nanopore has also miniaturised these devices to develop the MinION; a disposable DNA sequencing device the size of a USB memory stick whose low cost, portability and ease of use are designed to make DNA sequencing universally accessible.   A single MinION is expected to retail at less than $900.

I was not able to figure out from the press release how much data the disposable MinION generates for that $900.  It is a pretty low entry point for a sequencer—especially as apparently no library prep is needed, and so I expect that the MinION is likely to knock Ion Torrent out of the market (if MinION can deliver a reasonable number of bases for that $900/experiment cost).

On their page about DNA sequencing, they claim “read lengths of many tens of kilobases”.  They are also claiming low error rates, but I’ve not seen a number in print yet.  If any two of their three claims (low cost, long reads, and low error rate) are true, they will knock Pacific Biosciences out of the sequencing market.

My research lately has centered around using long Pac Bio reads to reconstruct genes with long, recent internal repeats, which could not be resolved with 454 or Sanger reads.  Much of this work should also be applicable to the long reads of the nanopore sequencing, though I’m sure that the error models will be quite different.  Of course, with no data available yet, it is hard to guess what the error models will be, and so hard to write any software.

I suspect that in a year or two, the standard approach for genomic sequencing will be to combine long nanopore reads with short Illumina reads, using the cheaper Illumina reads to do error correction on the long nanopore reads.  With current short-read technologies, it takes a lot of coverage to get contigs long enough to find genes in.  If the long reads of nanopores are not vaporware, then we may be seeing full genes even at fairly low coverage.

« Previous Page

%d bloggers like this: