Gas station without pumps

2011 May 31

A use for an Ion Torrent

Filed under: Uncategorized — gasstationwithoutpumps @ 09:21
Tags: , ,

I’ve been wondering what an Ion Torrent sequencer is useful for.  I mainly deal with de novo assembly of genomes, which needs a lot more data than an Ion Torrent sequencer provides, even for assembling bacterial genomes.  The high error rate and relatively short read length of the Ion Torrent reads is also a problem.  For de-novo sequencing, almost everyone is going with the Illumina platform, which provides barely long enough reads (a little over 100-long at each end of a pair) at the lowest cost.  I like to have some longer 454 reads to throw into the mix, but they are more expensive and confuse some of the de Bruijn graph assemblers.

This past week, though, I’ve been working on a problem that might be ideal for the Ion Torrent: assembling the mitochondrial sequence of the banana slug, Ariolimax dolichophallus.  I’ve been trying to assemble it from 10x whole-genome shotgun sequence using Illumina reads (with paired ends that were too close together, so many of the reads overlap in the middle).  The library prep looks like it was very good at excluding mitochondria:  the mitochondrial genomes seem to have little more coverage than the nuclear genome. [Correction: I must have dropped a decimal point somewhere—the coverage is indeed much higher for the mitochondrion: more like 200x than 10x.]

Since mitochondrial genomes are the primary way of identifying eukaryotic species (often using only a tiny snippet, the “barcode of life“), there is a lot of value in being able to determine the genome quickly and cheaply. A mitochondrial genome is much shorter than bacterial genomes (only about 15 kbases), which makes the low coverage, short reads, and high error rates not much oof a problem. If you have over 100x coverage on a short genome, you can still align and assemble it despite the noise, especially since repeats are not a problem in mitochondria.

Isolating mitochondrial DNA is also supposed to be relatively easy, so it might be good for Ion Torrent to put out a “mitochondrial genome kit” that makes isolating the mitochondrial DNA, sequencing it, and assembling the resulting genome very cheap.  This would take the rather thin taxonomic sampling of 2654 mitochondrial genomes at NCBI to hundreds of thousands in just a few years.  The key thing is to make the library prep very cheap and simple, since otherwise one could do barcoding to multiplex samples and piggyback on sequencing runs on the larger batch machines.

Average error rate and length of Ion Torrent reads after trimming off the bad bases at the end of the read. Note: this is not my data, so I can't provide any information about the library prep, regents or chips used, or any of the other information that may have a major bearing on the quality of the data.  Anyone who has Ion Torrent data that has a reference genome that can be mapped to can do a similar plot.  It would be useful to do such plots for all the platforms in common use, but I don't have the data for it.

For this data, the error rate on 50-base-pair reads is about 1%.  That is much worse than Illumina (which gets 1% around 90 base-pair reads) or 454 (which gets 1% around 400 base-pair reads).  Note that the Q values are reasonably (perhaps slightly optimistically) calibrated in that an error rate of x needs a cutoff about -10 log_10 x

I expect that Ion Torrent will improve their read length and accuracy, but without plots like this one, it is difficult to compare how platforms really perform.  The raw "number of bases" and "read length ignoring error" figures that get touted by the companies are so misleading as to verge on fraud.


2011 May 28

Dunk Tank

Filed under: Uncategorized — gasstationwithoutpumps @ 12:57
Tags: , ,

I was asked by the Society of Women Engineers to sit in the dunk tank as part of their fundraiser.  So last Wednesday, I spent about 25 minutes sitting on the cold seat in the E2 courtyard. It was not great weather for a dunk tank, as it had rained mid-day.  Luckily the sun had come out before my time in the dunk tank, but it was still quite chilly and the dunk tank water was very cold.

I don’t know how much money they raised (at 3 throws for $2), but I do know I was dunked 6 times. One of the students made a video recording of about half the time, and managed to capture three of the dunkings, which I edited into a 55-second video:

2011 May 26

Maker Faire complaints

Filed under: Uncategorized — gasstationwithoutpumps @ 08:08
Tags: ,

In my earlier post on the Maker Faire, I provided a lot of pictures and tried to convey some of the excitement of the event.  In this post, I’ll grouse a little about some of the negative aspects.  If you are put off by criticism, go read the other post, and ignore the rest of this one.

Here are a few things that bothered me:

  • There were few (no?) vendors selling the capability to make tiny quantities of printed-circuit boards. I know such vendors exist locally, because EE and computer engineering classes at UCSC use their services, but I didn’t see any there.  A few small companies were selling boards and a few were selling or giving away plans for boards, but no one (that I saw) was advertising the essential service of making the boards.
  • I also did not see vendors selling useful parts that normally are hard to find retail except sight-unseen from the web (like servos, h-bridges, project boxes, brass gears, propellers, wheels, and so forth).  It would have been great to have a few of the big web sellers (like Digikey) there in person with some of the more commonly needed parts. Even if they were just showing and not selling at the Faire, it would be great to see some of the parts in person before ordering.
    Perhaps Make magazine wanted no competition for their highly marked-up goods?  (I know they have a huge markup, because the OWI arm that they sell for $50 is available retail from the manufacturer’s web site for $30.)
  • The San Mateo Fairgrounds are far too small for the size of the crowd.  It was uncomfortable to move around, lines were very long to get into the fairgrounds, at lunch time and at all the bathrooms, and we probably missed some of the more popular displays because we couldn’t get close enough to see what they were. There were no quiet places to get away from the crowds.
  • Food was ridiculously overpriced—this is a standard problem at fair grounds, and was no worse here than at other fair grounds, but it was still a rather large hit to the wallet.  I would likely have bought more stuff from the main vendors, if the sticker shock from the food had not been so high.  Most of the food vendors posted prices that included all taxes (the only reasonable way to quote prices), but the gelato vendor surprised a lot of people by adding 9% on top of the posted price.  Given that they were already charging $5 for a scoop, they could have followed custom and posted prices that included all taxes.
  • The ticket booths did not have functioning credit card machines, so I had to pay cash for entrance.  Luckily I had been to an ATM machine just the day before in anticipation of needing cash for food and purchases at the Faire, so I had enough with me, though spending so much of it on tickets for everyone cut down on how much I was willing to spend once I was inside.
  • The Faire charged $5 for the schedule, so we never knew when any special events were supposed to happen.  The last time I went there was a highlights schedule on the free map that said when the major events were.  I missed that courtesy.
  • The fire sculptures this year seemed less impressive than in some previous years.

OK, that’s enough complaints from me for today.  Maker Faire was still worth going to, despite the problems, and I’ll probably go again next year.  But I hope they fix the more easily fixed things (like getting more parts suppliers and prototype production companies to participate).

2011 May 24

Why Discrete Math Is Important and The Calculus Trap

Filed under: Uncategorized — gasstationwithoutpumps @ 20:19
Tags: , , , ,

My son is nearing the end of his Art of Problem Solving precalculus class, and it is still going well, as I reported earlier.  We are now looking at what to do next.  Should he take Calculus BC at his high school in the Fall? Should he take the AoPS Calculus class this fall?  Or should he detour into a different branch of math?

The Art of Problem Solving people have written a couple of essays about pre-college math preparation:

  • The Calculus Trap. The basic premise of this piece is that the lock-step march through arithmetic, algebra, geometry, algebra, trigonometry, precalculus, and calculus is not the only or best way to study math.  Elementary and secondary math education is not a race to see who gets to the “finish line” (calculus) soonest.  Indeed, from a mathematician’s viewpoint, calculus is just one of many starting points for interesting math.  A lot of what gets tossed aside in a race to calculus is more interesting.
    Even more important: “the standard curriculum is not designed for the top students.”  They point out that being the top student in your class is not the way to make progress in your learning—you are better off getting to a level of challenge that really makes you exercise your problem-solving skills.  Racing through courses full of drill problems is not going to do that the way working on harder problems will.  There are plenty of hard problems that do not need a lot of mathematical machinery to solve, so students can start on them before having learned all the machinery.
  • Why Discrete Math Is Important. As a computer scientist and computer engineering who taught applied discrete math a few times, and as a bioinformatician who has to teach grad students all over again how to count (that is, how to do simple combinatorics) and how to do simple Bayesian probability, I am certainly in agreement that discrete math is important.  In fact, a big part of the reason I ended up in computer science rather than pure math was that I liked discrete math (graph theory and combinatorics) better than real or complex analysis, and I had made the mistake of starting my graduate math education in a department that had no discrete math. Luckily there were four or five great people doing combinatorics, graph theory, and graph algorithms in the computer science department, and I was able to switch departments.  That turned out very well for me, so perhaps it was a good thing that I’d chosen the wrong math department.

Most likely we’ll continue the standard progression, doing either Calculus BC or AoPS calculus next year.  After that, there will be time for applied discrete math and for probability and statistics before he goes off to college.

Median Earnings by Major and Subject Area

Filed under: Uncategorized — gasstationwithoutpumps @ 09:14
Tags: , , ,

The Chronicle of Higher Education just posted Median Earnings by Major and Subject Area, a graphical depiction of census data about “full-time, full-year workers ages 25 to 64 whose highest degree is a bachelor’s. ”

The economic value of a bachelor’s degree varies by college major. New data from the U.S. Census Bureau show that median earnings run from $29,000 for counseling-psychology majors to $120,000 for petroleum-engineering majors. Even when majors are looked at by groups, such as business or health, there is variation in pay depending on the specific major.

The results make it clear that the value of a bachelor’s degree varies enormously by major, in unsurprising ways (engineering tends to pay well with a B.S., then computer fields, business, health, physical sciences, … ). I notice that my son’s two favorite subjects, computer science and theater, are at opposite ends of the spectrum ($75k and $40k).  Of course, he is planning on going on to grad school, which changes the salary picture.

Going to the census tables that aggregate over all majors (25-64, all races), we see that the median income for different levels of degree holders:

Total no bachelor’s Less Than 9th Grade 9th to 12th Nongrad Graduate (Incl Ged) Some College No Degree Associate Degree Total bachelor’s and up Bachelor´s Degree Master´s Degree Professional Degree Doctorate Degree
35,727 17,095 19,844 27,967 32,363 36,374 52,256 47,345 60,957 100,374 80,944

One thing that the data do not show is the level of unemployment. A high salary for those employed is not much consolation for those with the same degree who can’t get a job. The Census medians do include the unemployed (at $0), so the median measure does reasonably show the effect of unemployment.

Another likely distortion is that the median salary measure hides any effect of age or years of employment. If some field has been unpopular for the past 30 years, then almost everyone in it is nearing retirement age, and the salaries may seem quite high, even if they are actually comparable to salaries in a different field.  There are ways to compensate for age effects, but one either needs a lot more data or some pretty strong assumptions about how years of employment affects salary (for instance, that current salary is the product of two independent effects: the degree and years of employment).  I don’t know whether this distortion is responsible for biological engineering being the lowest of the engineering degrees, or whether it is carryover from the lower salaries of biology majors.  Biomedical engineering (which is almost the same field) reports much higher median salaries.

Next Page »

%d bloggers like this: