Gas station without pumps

2015 January 27

Build a better life by blogging

Filed under: Uncategorized — gasstationwithoutpumps @ 18:27
Tags: , ,

Today in the mail I got the second swag as a result of my blogging.  (The first was a prototype of the Bitscope DP01 active differential probe, which I blogged about in First blogging swag and Bitscope differential probe).

This time, what I got was American Science and Surplus’s “Customer Thank You Gift Bag”, which contains

The total retail value of this swag is about $10, though I would not have spent that much to acquire any of it.  I might have picked up the mug in a thrift store for 50¢, as I need some more coffee mugs to use as beakers in the circuits course (for the thermistor lab and the electrode lab).  We can use the hot sauce (if it is any good), but the rainbow glasses and the plastic tops will need to be given away.  (I wonder if I should give them to my freshman design students for  good class participation or something, or whether I should have my wife give them to the first-grade teacher at her school.)

The reason that I was sent this rather valueless $10 gift bag is that American Science and Surplus noticed my post about a motor I bought from them a few years ago:

Kudos and thanks to the author – we have been perplexed about this motor for too long. Please contact us at http://www.sciplus.com so we can send you a little gift.
With kind regards from American Science and Surplus

It was nice of them to send me something, but I would rather have had some random motors than what they did send.

At about $85 in swag from 1467 posts, I’m making about 6¢ a post—good thing I’m not doing it for the financial rewards!

2015 January 26

More senior thesis pet peeves

Filed under: Uncategorized — gasstationwithoutpumps @ 22:17
Tags: , , , , ,

I previously posted some Senior thesis pet peeves. Here is another list, triggered by another group of first drafts (in no particular order):

  • An abstract is not an introduction. Technically, an abstract isn’t really a part of a document, but a separate piece of writing that summarizes everything important in the document. Usually the abstract is written last, after everything in the thesis has been written, so that the most important stuff can be determined. Most readers will never read anything of a document but the abstract.
  • Every paragraph (in technical writing) should start with a topic sentence, and the remaining sentences in the paragraph should support and expand that topic sentence. If you drift away from the topic, start a new paragraph! The lack of coherent paragraphs is probably the most common writing problem I see in senior theses.
  • I don’t mark every error I see in student writing. It is the student’s responsibility to learn to recognize problems that I point out and to hunt down other instances themselves. Students need to learn to do their own copy editing (or copy edit each other’s work)—I’m not interested in grading my own copy editing on subsequent drafts of the thesis.
  • Every draft of every document that is turned in for a class or to a boss should have a title, author, and date as part of the documents.  Including this meta-information should be a habitual action of every engineer and every engineering student—I shouldn’t be seeing last-minute hand-scrawled names and titles on senior thesis drafts.
  • Page numbers!  Every technical document over a page long should have page numbers. If you don’t know how to get automatic page numbers with your document processor, either stop using it or learn how!
  • Earlier this quarter I said that I did not care what reference and citation style you used, as long as it was one of the many standard ones. I’ve decided to change my mind on that—I do care somewhat what style you use for the reference list. Use a reference style that contains as much information as possible: full author names, full journal name, dates, locations of conferences, URLs, DOIs, … .  You may format it in any consistent manner, but provide all the information.
  • Use kernel density estimates instead of histograms when showing empirical probability distributions. My previous post explains the reasons.
  • Avoid using red-green distinctions in graphics. About 6% of the male population is red-green colorblind. There are color-blindness simulators on the web (such as http://www.color-blindness.com/coblis-color-blindness-simulator/) that you can use to check whether your color images will work.  Modern gene-expression heat maps use red for overexpression, blue for underexpression, and fade to white in the middle.  This scheme has the advantage of having the strong signals in saturated colors and the weak ones in white or pastels, blending into the white background.
  • Comma usage continues to be a problem for many students. I discussed three common comma situations in English:
    • Comma splices. Two sentences cannot be stuck together with just a comma—one needs a conjunction to join them. If a conjunction is not desired, an em-dash can be used (as in the previous sentence). Sometimes a semicolon can be used, but never a bare comma.
    • Serial comma. There are two different conventions in English about the use of commas before the conjunction in a list of three or more items. In American English, the comma is always required, but in British English the comma is often omitted. I strongly favor the American convention (also known as the serial comma or the Oxford comma), and I will insist on it for the senior theses—even for those students raised in the British punctuation tradition.
    • When using “which” to introduce a relative clause, the clause should be non-restrictive. That is, omitting the clause beginning with “which” should not change the meaning of the noun phrase that is being modified by the relative clause. Non-restrictive relative clauses should be separated from the noun phrase they modify with a comma. If you have “which” without a comma starting a relative clause, then check to see whether you need a comma, or whether you need to change “which” to “that”, because the clause is really restrictive. Note: “which” is gradually taking over the role of “that” in spoken English, but this language change is still not accepted in formal writing, which is more conservative than speech.
  • The noun “however” is a sentence adjective, but it is not a conjunction. You can’t join two sentences with “however”. You can, however, use it to modify a separate sentence that contrasts with the previous one.
  • Colons are not list-introducers. Colons are used to separate a noun phrase from its restatement, and the restatement is often a list. The mistaken notion that colons are list-introducers comes from the following construction: the use of “the following” before a list. The colon is there because the list is a restatement of “the following”, not because it is a list. Note that two sentence back, I used a colon where the restatement was not a list. Similarly, I don’t use a colon when the list is
    • the object of a verb,
    • the object of a prepositional phrase,
    • or any other grammatical construct that is not a restatement or amplification of what came before the colon.
  • Most students in the class use “i.e.” and “e.g.” without knowing the Latin phrases that they are abbreviations for. I suggested that they not use the abbreviations if they wouldn’t use the Latin, but use the plain English phrases that they would normally use: “that is” and “for example”. If they must use the Latin abbreviations, they should at least punctuate them correctly—commas are needed to separate the “i.e.” and “e.g.” from what follows, just as a comma would be used with “that is” or “for example”.
  • Some students use the colloquial phrase “X is where …”, when what they mean is “X is …”. The “where” creeps in in some dialects of English to serve as a way of holding the floor while you think how to finish the sentence—it doesn’t really belong in formal technical writing.
  • “First”, “second”, and “last” are already adverbs.  They don’t need (and can’t really take) an “-ly” suffix. It grates on me the way same way that “nextly” does. “Next” has exactly the same dual status as an adjective and an adverb, but for some reason does not often suffer the indignity of being draped with a superfluous “-ly”.
  • I recommend that students not use the verb “comprise”, as few use it correctly. You can say that “x, y, and z compose A”, “A is composed of x, y, and z”, or “A comprises x, y, and z”. The construction “is comprised of” is strongly frowned on by most grammarists—avoid it completely, and avoid “comprise”, unless its usage comes naturally to you.  “Compose” and “is composed of” are less likely to get you in trouble.
  • “Thus” does not mean “therefore”—”thus” means “in this manner”. Note that “thus” is an adverb, so there is no “thusly”.
  • “Amount” is used for uncountable nouns (like “information”), while “number” is used for countable nouns (like “cells”). There are many distinctions in English that depend on whether a noun is countable or not (the use of articles, the use of plural, “many” vs. “much”), but “number” vs. “amount” seems to be the one that causes senior thesis writers the most difficulty.

2015 January 23

Dress like it’s 1965 Winner

In Dress Like It’s 1965, I showed the clothes that I wore for UCSC’s “Dress Like It’s 1965″ Day on Thursday, 15 Jan 2015, to help celebrate the 50th birthday of UCSC (including the marvelous shoes my wife painted). Today I found out that I won 1st place in the men’s category! Pictures of the other winners can be found at http://50years.ucsc.edu/kick-off/.

Here is the picture they took of me, which was used for the judging:

Copied from http://50years.ucsc.edu/css/assets/images/kick-off/winners/1-guy.jpg Sorry, I can't find the photographer's name on the 50th anniversary website to give proper photo credit.

Copied from http://50years.ucsc.edu/css/assets/images/kick-off/winners/1-guy.jpg
Sorry, I can’t find the photographer’s name on the 50th anniversary website to give proper photo credit.

I feel like I cheated a bit, as I was reproducing what I wore in 1969–1971, not 1965. Also I’m wearing a modern digital watch, since I no longer own any analog ones and forgot to take the watch off. But the judges obviously weren’t too fussy.

2015 January 22

Kernel density estimates

Filed under: Uncategorized — gasstationwithoutpumps @ 22:29
Tags: , ,

In the senior thesis writing course, I suggested to the class that they replace the histograms that several students were using with kernel density estimates, as a better way to approximate the underlying probability distribution.  Histograms are designed to be easy to make by hand, not to convey the best possible estimate or picture of the probability density function. Now that we have computers to draw our graphs for us, we can use computational techniques that are too tedious to do by hand, but that provide better graphs: both better looking and less prone to artifacts.

The basic idea of kernel density estimation is simple: every data point is replaced by a narrow probability density centered at that point, and all the probability densities are averaged.  The narrow probability density function is called the kernel, and we are estimating a probability density function for the data, hence the name kernel density estimation.  The most commonly used kernel is a Gaussian distribution, which has two parameters: µ and σ. The mean µ is set to the data point, leaving the standard deviation σ as a parameter that can be used to control the estimation.  If σ is made large, then the kernels are very wide, and the overall density estimate will be very smooth and slowly changing. If σ is made small, then the kernels are narrow, and the density estimate will follow the data closely.

The scipy Python package has a built-in function for creating kernel density estimates from a list or numpy array of data (in any number of dimensions). I used this function to create some illustrative plots of the differences between histograms and kernel density estimates.

This plot has 2 histograms and two kernel density estimates for a sample of 100,000 points.  The blue dots are a histogram with bin width 1, and the bar graph uses bins slightly narrower than 5. The red line is the smooth curve from using Gaussian kernel density estimation, and the green curve results from Gaussian kernel density estimation on transformed data (ln(x+40.))  Note that the kde plots are smoother than the histograms, and less susceptible to boundary artifacts (most of the almost-5-wide bins contain 5 integers, but some have only 4).  The rescaling before computing the kde causes the bins to be wider for large x values, where there are fewer data points.

This plot has 2 histograms and two kernel density estimates for a sample of 100,000 points. The blue dots are a histogram with bin width 1, and the bar graph uses bins slightly narrower than 5. The red line is the smooth curve from using Gaussian kernel density estimation, and the green curve results from Gaussian kernel density estimation on transformed data (ln(x+40.)) Note that the kde plots are smoother than the histograms, and less susceptible to boundary artifacts (most of the almost-5-wide bins contain 5 integers, but some have only 4). The rescaling before computing the kde causes the bins to be wider for large x values, where there are fewer data points.

With only 1000 points, the histograms get quite crude, but kde estimates are still quite good, particularly the "squished kde" which rescales the x axis before applying the kernel density estimate.

With only 1000 points, the histograms get quite crude, but kde estimates are still quite good, particularly the “squished kde” which rescales the x axis before applying the kernel density estimate.

With even more data points from the simulation, the right-hand tail can be seen to be well approximated by a single exponential (a straight line on these semilog plots), so the kernel density estimates are doing a very good job of extrapolating the probability density estimates down to the region where there is only one data point every 10 integers.

Here is the source code I used to create the plots. Note that the squishing requires a compensation to the output of the kernel density computation to produce a probability density function that integrates to 1 on the original data space.

#!/usr/bin/env python3

""" Reads a histogram from stdin
and outputs a smoothed probability density function to stdout
using Gaussian kernel density estimation

Input format:
  # comment lines are ignored
  First two columns are numbers:
	value	number_of_instances
  remaining columns are ignored.

Output format three columns:
  value	 p(value)  integral(x>=value)p(x)
"""

from __future__ import division, print_function

from scipy import stats
import numpy as np
import sys
import itertools
import matplotlib
import matplotlib.pyplot as plt

# values and counts are input histogram, with counts[i] instances of values[i]
values = []
counts = []
for line in sys.stdin:
    line=line.strip()
    if not line: continue
    if line.startswith("#"): continue
    fields = line.split()
    counts.append(int(fields[1]))
    values.append(float(fields[0]))

counts=np.array(counts)
values=np.array(values)

squish_shift = 40. # amount to shift data before taking log when squishing

def squish(data):
    """distortion function to make binning correspond better to density"""
    return np.log(data+squish_shift)

def dsquish(data):
    """derivative of squish(data)"""
    return 1./(data+squish_shift)

instances = np.fromiter(itertools.chain.from_iterable( [value]*num for value, num in zip(values,counts)), float)
squish_instances = np.fromiter(itertools.chain.from_iterable( [squish(value)]*num for value, num in zip(values,counts)), float)
num_points = len(squish_instances)

# print("DEBUG: instances shape=", instances.shape, file=sys.stderr)

min_v = min(values)
max_v = max(values)

squish_smoothed = stats.gaussian_kde(squish_instances)
smoothed = stats.gaussian_kde(instances)

step_size=0.5
grid = np.arange(max(step_size,min_v-10), max_v+10, step_size)

# print("DEBUG: grid=",grid, file=sys.stderr)

plt.xlabel("Length of longest ORF")
plt.ylabel("Probability density")
plt.title("Esitmates of probability density functions")

plt.ylim(0.01/num_points, 0.1)
plt.semilogy(values, counts/num_points, linestyle='None',marker=".", label="histogram bin_width=1")
plt.semilogy(grid,squish_smoothed(squish(grid))*dsquish(grid), label="squished kde")
plt.semilogy(grid,smoothed(grid), label="kde")
num_bins = int(5*num_points**0.25)
plt.hist(values, weights=counts, normed=True,log=True, bins=num_bins,
	label="histogram {} bins".format(num_bins))

plt.legend(loc="upper right")
plt.show()

2015 January 19

More on freshman design projects

Filed under: freshman design seminar — gasstationwithoutpumps @ 22:01
Tags: , ,

In Student project ideas for freshman design seminar, I listed some of the ideas students had brought up as possibilities for the freshman design seminar. For homework, I had them each comment on at least two other projects on the class e-mail list. Today, I counted commenters on the various proposed projects for the freshman design seminar:

5       temperature sensor (IR? I may have been lumping a couple of different temperature sensor ideas together—I did not follow all the links.)
4       PCR machine
4       blood pressure monitor
3       photogate
3       pH meter
2       colorimeter
2       EKG
2       fume extractor
2       sound level meter
2       centrifuge
1       microbial fuel cell
1       photospectrometer
1       general sensor
1       pulse monitor
1       function generator
1       lensless microscope
1       voltmeter
1       LED color mixer
1       motion sensor
1       stir station

I notice that no one picked the bacterial incubator project this year, which is too bad, as I have a dozen styrofoam shipping boxes that would be ideal for the project, and I spent a fair amount of time figuring out how to do the control loops (Temperature-control project for freshman design seminar, PWM for incubator, More on incubator design, Thermal models for power resistors, Thermal models for power resistor with heatsink, PWM heater and fan, PWM heater and fan continued, Controlling the heater and fan, Putting the heater in a box, Improving feedback for fan , Thermal control loop working (sort of)) and how to teach them.  Of course, given how much time it has been taking to teach reading a simple photodiode, I don’t know that I’d actually have been able to get through teaching proportional-integral control with anti-windup provisions, so maybe it’s just as well.

We’ll discuss possible projects on Wednesday.  I think that I can cross out a few as being unsuitable for the lab facilities we have available (no sinks, so wet-lab stuff is not reasonable) or as pointless, and one or two as too ambitious, but most of the projects are reasonable.  They vary a lot in difficulty, though, so I’ll have to help the students match their ambition with their willingness to work.

« Previous PageNext Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 306 other followers

%d bloggers like this: