Saturday, December 31, 2005

Programming languages as options

Two related posts about programming languages, which, together with operating systems and programming style guidelines, are the Holy Trinity of "Things Most Likely To Cause Computer People To Turn Into Religious Fanatics":

- Joel Spolsky's post about the perils of JavaSchools
- The QofW take on Joel's post

My personal take on this is that there's no perfect programming language, you should always pick the language most appropriate for your task and knowing more languages and programming paradigms just increases the tools you have in your toolbox. Expanding on this relatively content-free platitude and applying it to the question at hand:

I took the version of the Penn CSE course that Joel describes, where you're taught Scheme and ML before C/C++. It was definitely hard to wrap your head around Scheme when you were already used to an imperative style of programming, and lots of people said "Never mind" and dropped out of the major. For years afterwards, I reflexively recoiled from anything to do with AI because I'd been told that Scheme/Lisp was extensively used in AI and I never wanted to go near anything with that many parentheses again. And I've never had to use Scheme or ML again. So, on one hand, I agree with the argument that, for the most part, any reasonable language has all the facilities you're likely to need, that Java is a perfectly reasonable thing to teach and that teaching people Lisp [and, to a much lesser extent, C] is a lot like forcing people to learn Latin.

On the flip side, though, I do think that, right now, starting out with languages like Scheme and C is actually a better way to go, in the long run, but for somewhat different reasons:

- Scheme, and functional languages in general, are just a totally different way of doing things than imperative languages are. You think about your data structures differently, you manipulate them differently and there are facilities in functional languages that just make them an easier fit to certain tasks than imperative languages [the same, of course, is also true the other way around]. So learning a functional language increases your awareness of alternative approaches to tackling a problem, which, I claim, is always a Good Thing.

- C, more than anything else, just forces you to get closer to the machine and, in the process, be careful and be aware of what you're doing. It's been argued that being close to the machine [eg having to do your own memory allocation, treat strings as null-terminated character arrays etc] is also becoming obsolete because it promotes buggy code and reduces programmer productivity, so it's much better to rely on innovations like garbage collection and the ready availability of class libraries etc. That's a reasonable argument, for the most part, but it's subject to the Law of Leaky Abstractions -- sometime, somewhere, something is going to break in the abstraction layer you're sitting on top of and if you don't know how to go down to the appropriate level, figure out what's going on, and fix it, you're hosed. Also, if you ever get into really hardcore systems hacking/optimization, the chances are that you're going to have to break your abstraction layer and dive down into the guts of the system. So while using languages like Java that relieve you of some of the more mundane aspects is the right thing to do a lot of the time, you're better off learning something lower-level, like C/C++, first.
You could, of course, extend this argument and say that everybody should learn assembly language; while not entirely unreasonable ;-), my general rule of thumb would be to stop one level down: learn something that's one level lower than the currently most-used level of abstraction [eg C instead of Java, at the moment]. That way, you have the option of choosing the right tool for the job.

So, in the end, I agree with Joel that it's better to start off with Scheme and C than with Java, but not for the reasons he gives -- it's not about pointers and recursion specifically, it's about having a wider range of options than the least common denominator.

- Thanks to Cosma for pointing me at Ook.
- Another reason it's really not about recursion specifically: in practice ie when building a real-world system, using recursion isn't that great an idea anyway, at least not unless it's tail recursion, which is just iteration by another name. Because otherwise you blow your stack space, and that's no fun for anybody.

Wednesday, December 21, 2005

The end of larnin'

... or at least the end of having to larn' stuff and answer pointed questions about it.

As of noon today, my semester is over. And, with that, I'm pretty much done with the classes I'm required to take. So, no more professors asking me invasive personal questions about things like max-margin classifiers, or what the picture of an electrophoresis gel allows me to conclude about the position of restriction sites on the plasmid etc. At least, not unless I choose to take more classes. [which I probably will because, hey, somebody else is paying for it]. Or until my qualifying exam, which I'll have to take sometime between June and August next year [and whose nature is still rather nebulous -- it's never been administered because my PhD program is brand new ...].

My only other required contact with classes will be when I have to act as a teaching assistant next semester. That's something I'm looking forward to about as much as a root canal, given that I have zero aspirations of ending up as a professor or teacher in any official capacity. That's not to discount the beneficial effects of the fact that it forces one to understand the material better, or because I think it's not an important job. It's just one of those important jobs and/or forcing functions that I wish I didn't have to deal with ;-)

Of course, now I don't have any more excuses to not make any progress on actual research ...

Monday, December 19, 2005

Labeling considered important


You are in possession of: two identical bottles containing the same volume of colorless liquid. Bottle 1, labeled "A" contains liquid A; bottle 2, labeled "B" contains liquid B [ie the only distinguishing marks are the labels].
And you are not in possession of: a photographic memory that remembers little details like the spatial ordering of the bottles and labels
And you want to: rewrite the labels to make them more legible
Do not: rip off both labels at once, wad them up and throw them into the trash
Because you will: not have any clue which bottle contains which liquid. And you will have to remake said liquids, which may or may not be a huge pain in the posterior. Especially if you have, 10 minutes previously, spilled most of your supply of one of the necessary components, all over your lab bench.

This concludes today's lesson in "Short Non-Protocols in Molecular Biology". Thank you for your attention. You may now return to your regularly-scheduled cloning.

Sunday, December 18, 2005

Spy Vs Spy

MIT uses SpamAssassin to help filter spam email and, for the most part, it does a good job. However, it seems the SpamAssassin is sometimes outwitted by another assassin: Jason Bourne. I say this because the last couple of spam emails that have made it into my inbox have contained segments from Robert Ludlum's Jason Bourne series of novels, like:

slumped in the rear seat waiting to hear the words. The nun comes out, monsieur! cried the driver. She enters the first taxi! Follow it, said Jason, sitting up. On the avenue Victor Hugo, Laviers taxi slowed down and pulled up in front of one of Pariss few exceptions to tradition-an open plastic-domed public telephone. Stop here, ordered Bourne, who climbed out the instant the driver swung into the curb. Limping, the Chameleon walked swiftly, silently, to the telephone directly behind and unseen by the frantic nun under the plastic dome. He was not seen, but he could hear clearly as he stood several feet behind her. The Meurice! she shouted into the phone. The name is Brielle. Hell be there at noon. ... Yes, yes, Ill stop at my flat, change

I guess the inclusion of random innocuous text like this in addition to the usual pitches for Cialis etc is enough to throw off the spam filter, at least for a while.

Is the use of segments from this particular series of novels, with their theme of sneakiness of various sorts, a knowing wink on the part of the spammers ? Who knows, but I found it amusing all the same.

Friday, December 16, 2005

Finals, my favorite part of the year

Classes are over, and all that stands between me and some downtime are two finals. Unfortunately, these finals illustrate two of the extremes encountered in classes.

- For my statistics final, we're not allowed to bring any sort of notes, so I have to memorize what amounts to ~13 pages of formulae. The entire class was basically an exercise in chug-and-plug: figure out what the right formula is, stick in the appropriate numbers/symbols and, voila, you get an answer. Very little emphasis on a deeper understanding of the material, so the only thing that really gets tested is how well you can pattern-match between what you've memorized and what the problem demands.

- For my machine learning final, we can bring whatever we want, short of a computer or another person. Unfortunately, if the midterm, past problem sets and the final practice problem set [which explicitly says "Some of these problems are very hard"] are any indication, that's not going to help much because this class actually requires a pretty deep understanding of the material and the professor has no qualms about making us squirm. An issue that is not helped by the fact that we covered lots of material, but generally not in very much depth.

Somewhere, there has to be a happy middle ground.

All whining aside, I thought both classes were [or will be] useful, which, in the end, is what counts.

Wikipedia: good, bad or indifferent ?

Two somewhat opposing viewpoints on Wikipedia:

- Nature magazine says that Wikipedia's science entries are, on average, not bad [but not great either]. That squares with my experience with Wikipedia, which has been mostly confined to looking up science-type stuff.
- The Penny Arcade take on Wikipedia, which I find pretty funny [especially the "quantum encyclopedia" bit] and also agree with, for the most part.

I can definitely see how there can be lots of editing and re-editing and inchoate debate about entries on subjects that are, well, open to debate. That's something that Wikipedia's science entries probably have to contend with less -- the equation is either right or wrong, there isn't a gray area -- which may lead to higher-quality entries.

I suppose the results of the Nature survey are a good sign for OpenWetWare (OWW), which is a Wiki-based attempt at allowing folks doing research in all areas of biology to easily share information. [Full disclosure: I'm a big fan of OWW, have used it a ton myself, and am a member of a team of MIT folks that just got an iCampus grant to spread the word about OpenWetWare and get more labs/people to join up, so my sympathies are definitely with the Wikipedia style of doing things.]

Friday, December 09, 2005

Virtual laziness

Apparently, playing videogames is too much work for some people, so they're outsourcing it. I'm sure it's only a matter of time before somebody makes a videogame in which your character plays somebody sitting on a couch playing a videogame. Because, y'know, all those "active" videogames are too physically demanding.

How lazy can you get ? We're well on the way to finding out.

Tuesday, December 06, 2005


The abstract for a scientific paper is kind of like the elevator pitch for the paper: it should make you want to read the whole paper or, failing that, at least give you an idea of what the paper is about, and the major results contained in the paper. Either way, it's supposed to increase the amount of information you have.

However, sometimes there are abstracts like this:

"We describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways that compose a cellular system from genome-scale nondirectional networks of correlations among the genes of the system. The EVD formulates a genes x genes network as a linear superposition of genes x genes decorrelated and decoupled rank-1 subnetworks, which can be associated with functionally independent pathways. The integrative pseudoinverse projection of a network computed from a "data" signal onto a designated "basis" signal approximates the network as a linear superposition of only the subnetworks that are common to both signals and simulates observation of only the pathways that are manifest in both experiments. We define a comparative HOEVD that formulates a series of networks as linear superpositions of decorrelated rank-1 subnetworks and the rank-2 couplings among these subnetworks, which can be associated with independent pathways and the transitions among them common to all networks in the series or exclusive to a subset of the networks. Boolean functions of the discretized subnetworks and couplings highlight differential, i.e., pathway-dependent, relations among genes. We illustrate the EVD, pseudoinverse projection, and HOEVD of genome-scale networks with analyses of yeast DNA microarray data."

This one achieves almost the opposite effect; it reads like something generated by SCIgen. It's like a finely-crafted mind virus that enters your brain via your optic nerve, scribbles over some perfectly good empty memory cells and fills them with gobbledy-gook. After reading it several times, trying to figure out what it meant, I feel like I actually know less now. It's sort of like Snow Crash, but for scientists.

Things like this shouldn't be called abstracts, they should be called obfuscats.

[Disclaimer: I haven't actually read the paper. For all I know, it might be a masterpiece of clear exposition and cutting-edge science, but with an abstract like that I suspect not many people will ever find out. ]

Sunday, December 04, 2005

The apocalyptic week in film

Here's an idea for a new Netflix feature: move past the one-at-a-time recommendation service and start to evaluate a user's queue with an eye towards warning people if their queue could do with a little diversification in terms of genre etc. If they had such a feature, for example, they could have given us a warning along the lines of "Unless you're feeling really, really chipper or thinking about suicide and just need that extra little bit that'll push you over the edge, you may want to consider inserting a few more light-hearted movies into the top of your queue."

I say this because we've just, unintentionally, had a bit of a depressing week when it comes to movies:
- It started off innocently enough with "War of the Worlds" which, while fictional, wasn't exactly a feel-good movie [and had some plot points I took issue with].
- Next came "War Photographer", a documentary about James Nachtwey, a man who has spent the last 25 years chronicling pretty much every depressing thing known to man -- wars, famine, the AIDS epidemic etc; check out his website for some very powerful pictures. According to his colleagues, he's been able to take pictures of suffering and misery for a quarter-century without becoming cynical or detached, like many other war photographers. In the interview clips shown with him in this documentary, he's an amazingly calm and soft-spoken man, so he's either figured out how to let off all the pressure that must come from what he
's seen or he's kept it all bottled up and when he does explode, he's going to take several city blocks with him.
- The third movie was "The Sea Inside", a movie inspired by the story of Ramon Sampedro, a quadriplegic who fought for several years to be allowed to die. Talk about feeling powerless when you're so disabled that you can't even kill yourself ...
... and to cap it off, we watched "Hotel Rwanda" last night, the story of Paul Rusesabagina, a hotel manager in Rwanda who sheltered over a thousand Tutsis during the 1994 massacres in Rwanda. That man should be canonized [if he's a Catholic] and given the Nobel Peace Price without any discussion. The movie doesn't dwell much on the cowardice of the UN, the Europeans and US in failing to deal with what was going on; read "We Wish To Inform You That Tomorrow We Will be Killed With Our Families" if you want to get really pissed-off about that.

After all that, we're ready for some lighter fare, like, say, Babe.