Friday, September 30, 2005

"Is our machines learning ?"

This is, of course, a question that is rarely asked, except by certain people, and then only with respect to children.

In any case, the answer, so far, appears to be "yes, they is". I'm referring to my machine learning class -- not only are we being asked to work with real data [and not just crank through a bunch of mathematical gobbledy-gook] on the homework, I just came back from a recitation that was entirely devoted to giving us an intuitive feel for why the math works the way it does, with the barest minimum of equations.

I'm liking this.

Wednesday, September 28, 2005

Art is cool, but not that cool

Last weekend, there was an art fair of some sort near our place, which Christina and I utterly failed to go to. However, we walked by after it had wrapped up and I noticed a flier trying to lure people to avail themselves of the services of one of the artists at the fair, by getting a portrait drawn in 10 minutes, for $40.

Now, I can kind of see the appeal of the portrait bit: you only have to sit still for 10 minutes and the artist [if he/she knows what's good for him/her] will leave out all the unflattering bits. And you'll get a warm glow of self-satisfaction for being a patron of the arts, helping independent artists survive etc. The thing I totally don't get, though, is: $40
?? For 10 minutes of low-stress, low-risk work, with the only cost-of-business overhead being a sheet of paper and some pencil lead ? That's a billing rate of $240 an hour. The only way I'd pay somebody that much for a 10-minute portrait is if they were somehow able to paint my soul and highlight all the bits that need improvement in order for me to attain enlightenment and escape the otherwise-endless cycle of rebirth. [Side note: should death then also be called "redeath" ?]

I guess I'm just a boor who doesn't truly appreciate art.

*Update: it was gently pointed out to me that the artist was offering portraits, not self-portraits.

Tuesday, September 27, 2005

More samples from statistics class, not necessarily random or independent

If you're going to give an example, pick a good one: For many mathematical methods, there's an easily-understood example that illustrates why the method is important ie why anybody bothered to come up with it in the first place. Unfortunately, many math professors seem to either not know about those examples, or forget about them when introducing a new method. For instance, compare these two introductions to Bayesian parameter estimation [to an audience that's never heard of it]:

1) "Suppose I have a probability distribution for the parameter p in a Bernoulli distribution, based on, well, just about anything. If I then observe some data drawn from the distribution parameterized by p, how should I update my probability distribution for p ? Bayesian parameter estimation is one way of doing this".

2) "Suppose I want to figure out the chances of a particular coin coming up heads. I could toss it a bunch of times and count the number of times a head appears. However, suppose I toss it 3 times and I always get tails. Does that mean the coin will never come up heads ? Our intuition says no, and Bayesian parameter estimation is one way of formalizing that intuition and taking into account the fact that we have some ideas about what the chances of coming up heads should be before we even toss the coin."

I know my reaction to approach #1, as my first introduction to the topic, would have been "Wha' happen ?"; #2, on the other hand, made sense as soon as I heard it. Unfortunately [for people who hadn't heard it before], #2 is not the way this particular topic was introduced in class today -- approach #1 was the one chosen and I don't think many people really grokked the whole point of it.

Somebody really should pull together intuitive examples like this from all areas of math and professors should be required to memorize them. The chances of that actually happening are, of course, arbitrarily close to zero [a prior belief I will be happy to update should I get some data ;-)].

"Periodic sampling coupled with lazy updates": This is a phenomenon exhibited by a poor sleep-deprived undergraduate student: she'd fall asleep for five minutes, return to a semi-awake state, look at what had been added to the board since her last period of wakefulness, scribble down the new stuff and then fall back asleep.

Given the fact that her notes looked pretty chicken-scratchy and that her memory of the class is probably a series of 10-second sound samples along the lines of "... and the Bayes estimator for this is ...", "... the Beta is a conjugate prior distribution ...","... the mean of the posterior evaluates to ...", I'm not sure that I would advocate this as an effective learning strategy.

Tuesday, September 20, 2005

You may be in a math class if ...

- Your professor is [probably] younger than you [for a certain value of "you" =)]: something about mathematicians doing their best work at a young age.
- The only actual numbers you ever see are 0 and 1 [and maybe infinity, though that's not really a number], everything else is symbols: connections to real-world data are left as that most dreaded of all things -- an "exercise for the reader".
- You are frequently subjected to unprovoked proofs: the professor says something like "And now we need a result that you should all remember from previous classes in X", everybody in the class stares back with a perfect poker face, betraying neither knowledge nor ignorance and the professor then says "... but if you don't, let's prove it". Sometimes, it'd be really nice to be able to say "Dude, no, really, I trust you on this one, let's just move on."
- Even the sentences that have no mathematical content sometimes don't make sense: after assaulting you with a proof you really didn't need, the professor says "And even if this is confusing, the meaning should be clear". Right, that clears it all up, then.
- There are one or two magic principles that, like fairy dust, can be used to explain everything and you're screwed if you don't really understand them: examples are the Central Limit Theorem and the Law of Large Numbers.

These observations were sparked by a statistics class I'm taking. In his defense, the professor does explain the material quite well, without resorting to the low-down-and-dirty "... and it should be obvious that ..." trick very often. Wiser heads have suggested that I acquire statistics through self-study of an apparently very good book, but for now I'll stick with the class and see how it goes. Thankfully MIT lets you drop a class until ridiculously late in the semester, so I'll have plenty of time to flee should I be over- [or under-]whelmed in a few weeks.

Monday, September 19, 2005

This bank is protected by highly-trained professionals

[Random observation]
Apparently, the security guard motto is "observe and report", not "kick @$$ and take names, if necessary". Maybe that explains why the security guard standing outside my bank weighs on the order of 300lbs, with about 150lbs of that weight solidly concentrated in a massive gut. He'd be pretty much useless if there were any need to move a distance of, say, 20 feet in under 60 seconds since there's lots of inertia and not much accelerative force in that body, but as an immovable watch-tower I suppose he does the trick.

Then again, maybe he's actually hiding a dwarf underneath his tightly-stretched white polo shirt, ready to leap out and start swinging some serious ax ...

Tuesday, September 13, 2005

Custom genomes: coming soon to an organism near you

The first Endy lab paper has just been published: "Refactoring bacteriophage T7". That's quite a milestone for freshly-minted professors [I think].

However, quite apart from the symbolic significance, the contents of the paper are pretty damn important too [even accounting for my obvious bias =)]. The short version: Leon, Sri and Drew rearranged and changed 30% of the genome of a [harmless] virus called T7 that infects E. coli bacteria. Why did they do this ? To test how much we really know about this virus, and make it easier to do experiments on it. While that's a laudable goal in and of itself, what's so cool about this is that it's the first time anybody has really made such extensive and [more importantly] human-designed changes to the DNA of an organism [without killing it]. Like the associated "News and Views" article says:

[...], the restructuring of the T7 genome by Chan et al represents an important step toward the intentional design and construction of artificial living systems.

In other words: we're starting to be able to create custom-designed lifeforms. That should give you some pause for thought.

Thursday, September 08, 2005

Gettin' classy with it

Somewhat like other people [like being a bit more, ahem, mature than most of my classmates and having some experience outside the Academybubble], it's time for some more formal edju-ma-cation, in the form of classes. So far, I've checked out a machine learning class, two statistics classes ["Statistics for Applications" and "Statistics for Scientists and Engineers"; still no idea what the difference is ...] and a genetics class. I expect I'll take the machine learning class and one of the statistics classes for credit and just sit in on the genetics class. [After all, there's only so much opportunity a man can handle, especially if said man wants to also get some research done.]

And, in the span of two days of class, there have already been a couple of funny moments. The first one occurred during the genetics lecture, when the professor was talking about the Shi gene in fruit flies. It turns out that there is a version of this gene that's what's called "temperature-sensitive", which means that the protein made by that gene only works properly at certain temperatures. Well, the nomenclature for temperature-sensitive genes is to add a "ts" to the end of the name of the gene. Now, what do you get when you add "ts" to the end of "Shi" ? That's right. You'd think this would have at least gotten a titter out of a class of about 300 sophomores, but other than seeing a couple of people smirk, I observed nobody with a similarly juvenile reaction.

The second and third moments happened during one of the statistics lecturers. After about 45 mins of the (substitute) professor going through a review of elementary probability theory, he suddenly got bored and said "And, for those of you who are bored by this, let's jump to something a bit more advanced, which will use some concepts that will be covered in future lectures" and then proceeded to solve Buffon's Needle Problem. In context, that's a bit like a teacher going from reading "Spot is a dog. See Spot run. Spot runs around the house." to saying "Ok, now that we've got this reading thing sort of down, let's read Moby Dick" -- not what I'd call a smooth transition.

But it got better. Solving the simple version of this problem took about 40 minutes, and he really wanted to get to the more complicated version, so in the last 5 minutes of class he set up the more complicated version, said "Ok, I'll leave actually doing this to you, but I'll give you the answer" and then proceeded to pull out a little black address book in which he had written the answer. Try as I might, I really can't come up with a scenario in which I would write down the answer to something like that in an address book in which I presumably keep more ... pragmatic things, like phone numbers and addresses. I mean, is that algebraic expression something he has to refer to a lot ? Does he really like the way it looks and so he pulls it out occasionally to marvel at it ? Is it something he pulls out at cocktail parties ? I'm just at a loss.

On a more "meta" level, I feel like the last couple of years have been somewhat ironic in that I'm now embracing all the academic areas I ran away from previously. I couldn't wait to be done with biology in high school, steered clear of probability and statistics in college and had no idea why anybody might want to know something about a subject as boring as "machine learning". In the meantime, I've switched to thinking biology is the most interesting stuff to work on, I actually enjoyed learning probability this summer and I'm looking forward to my statistics and machine learning classes this semester. I guess I've finally grasped that statistics and probability really are essential tools for making sense of a complicated world. [Bill, as usual, already knows this.]

With age comes wisdom =)

Monday, September 05, 2005

The real MSPoll

Microsoft has an annual employee poll, called the MSPoll, in which everybody is asked to rate how they feel about their work environment, their managers, the company strategy etc. Nobody is forced to take this [anonymous] poll, but everybody is strongly "encouraged" to do so by their managers [who can see how many of the people reporting to them have already filled out the survey], by dint of indirect methods like group emails to the effect of "Only 25% of you have filled out the poll; c'mon, please fill it out ..." or more direct approaches, like asking people point-blank whether they've filled out the survey yet [technically, not allowed, but still done].

Out of this poll comes a set of neat little numbers that purport to capture how employees are currently feeling; the theory is that managers will look at these numbers and then use them to make [positive] changes. In practice, I never saw the company-wide numbers really change -- efficiency of cross-group cooperation always got a poor rating, employee satisfaction tended to however around 70%, a substantial fraction of people didn't understand/agree with overall company strategy etc. So, from that perspective, I never saw much value in the poll.

However, the written comments actually were useful, because they represented individual voices behind the abstract numbers, and talked about what really bugged people. In that vein, I really hope that people at MS who can make a difference are reading what's being posted to Mini-Microsoft, especially the comments, thinking about how to address all the dissatisfaction being voiced there, and not just dismissing it as a few malcontents pissing and moaning. The reason I think the comments there are so valuable is because they're often not one-off comments, but an actual discussion thread, which is something you don't get out of the MSPoll comments.

Why do I care ? Because I still have a soft spot for the company, know that there are a bunch of really smart, dedicated people there and hope that it can go back to being the fun sort of place it was when I first worked there. Granted, the stock is never going to be what it used to be, but hopefully it can still be a place where the emphasis is on writing and, more importantly, shipping cool software, not all the bureaucracy and wheel-spinning that it seems to have devolved into.

Friday, September 02, 2005

What I did last summer

Well, summer is over, heralded mainly by fact that it's suddenly "buzzier" everywhere, what with all the undergrads and their stressed-out, nervous vibe re-infesting the campus. With that comes the inevitable "How was your summer ?" question; looking back, the last few months have actually been reasonably busy:

- I picked an advisor
- I decided I'd made the right choice in leaving the MS mothership
- I turned 31
- I finally wrote up the story of our honeymoon from hell
- I dislocated my little finger [and it's still not back to normal]
- I looked for a thesis topic
- Christina and I went back to the promised land
- I got to watch a MotoGP race live
- Christina had a photoshow [together with her classmates]
- Christina turned 32
- I picked a different advisor [and am still looking for a thesis topic ...]
- We went to Philly for my brother's housewarming party
- Christina became even more of a photography fiend, taking lots of cool pictures.
Side note: The "Surfer in the big city" pictures earned us a run-in with a few renta-cops who were very concerned about their building being used as a backdrop because of, y'know, security. Obviously, Bad Guys trying to take pictures of a building for nefarious purposes would set up a tripod directly opposite the building [
after all, there's no such thing as a telephoto lens] and really take the time to get the right picture. Only by dint of me obsequiously agreeing that they were doing a very important job, that they were right to be suspicious of us, promising not to include their building in the shot etc did they finally unclench. Lesson: No More Taking Pictures In Downtown Boston -- Our Collective Safety Demands It !
- I learned probability theory by dint of reading through a very good book used in an MIT probability course and actually doing all the exercises in the book, as well as all the problem sets and quizzes that students who took the course had to do. Now I can actually understand questions
that involve fancy phrases like "random variable" and don't involve coins, cards or dice =)

On to year II in The Academy and Beantown.

Bibles, yeah, that's just what they need.

From a NYT article about the relief efforts in Louisiana, specifically the people evacuated to Houston:

But in Houston, there were hot showers, crates of Bibles and stacks of pizzas, while in New Orleans, many refugees scrounged for diapers, water and basic survival."

Bibles ? Are you f!cking kidding me ? Whoever brought those "crates of Bibles" should first be smacked around the head with them and then dropped off on top of a flooded house in New Orleans, together with his/her Bibles, since, clearly, a Bible is what refugees from this kind of disaster need most. Because, y'know, now would be a great time to thank the Lord for what he's done for you. Or to find him, so he can protect you from future disasters like this, since he has such a great track record for protecting people who believe in him.

Complexity ? No, I didn't order that. Send it back.

In various earlier posts [like this], I made a passing reference to the fact that biologists are finding out that RNA, "the other nucleic acid", plays a much bigger role in biological regulation than we've thought for the last few years, with our focus on DNA and proteins. That role continues to expand -- the current issue of Science magazine [subscription required, unfortunately] is devoted to highlighting the many processes that non-coding RNA [ie RNA that doesn't code for proteins] is involved in.

The main takeaway, for me at least, is this: things just got a whole lot more complicated. We were already struggling with trying to figure out how all the genes/proteins we know about work together, and the bits of DNA encoding those genes only represent about 2% of the human genome ie we thought only about 2% of the human genome did much of anything. However, a recent paper showed that 10% of the human genome is "active" [and a whopping 62% of the mouse genome is active, as described in papers published in this week's issue of Science], and the extra "stuff" is all these non-coding RNAs whose function we really have no clue about. In other words, we may have to worry about 5 times as much stuff as we thought we had to worry about [in humans], and chances are that this means things are [a lot] more than 5 times as complicated as we already thought they were.

TK has a funny way of describing the difference between engineers and scientists: scientists, when presented with something complicated, say "Cool ! I can spend lots of time figuring out how it works !". Engineers, on the other hand, like simplicity, so their reaction is "Who ordered that ?". I fall into the engineer camp, so the depressing part about all this to me is that there's no obvious method for figuring out how it all works other than to just painstakingly do all the required lab work.

The idea/hope/theory is that computational methods will help to speed up the rate of discovery, but it's not clear to me how much they've really helped so far -- they're certainly good at generating lots of hypotheses, but at the end of the day you end up having to do the experiment anyway to verify the hypothesis, and that's what takes a long time. To some extent, it seems like computational methods have mainly made it clearer how little we really know, which is useful in and of itself, but probably not quite what most people are hoping for.

All that said, it's pretty clear that without developing more and better computational [and experimental, of course] methods for generating and examining data, we're never going to have a decent handle on how Mother Nature works. We might as well grit our teeth, get on with it and hope that we don't discover even more complexity.