Tuesday, December 21, 2004

First semester braindump

A recap of what happened over the last 3.5 months as far as my edu-ma-cation goes.


CSB 100: "Topics in Computational and Systems Biology". This was a "literature-based" class ie we read a bunch of papers and discussed them. Good class in terms of looking at a very broad range of topics eg DNA microarrays, high-throughput protein phosphorylation measurements, Bayesian networks, robustness and modularity in biological systems etc. My main problem with it was that it wasn't entirely clear what we were supposed to get out of each paper ie what the take-home message from each one was supposed to be.

Biological Engineering 420: "Biomolecular Kinetics and Cellular Dynamics". This was a class that really had two parts. The first part concentrated on applying the principles of chemical kinetics [ie how fast chemical reactions occur] and equilibria ["given compounds A and B, how much compound C do I get when the reaction is complete ?"] to the chemical reactions that occur in biological systems. Examples of such reactions are molecules (like drugs) binding to receptors on cells, the chemical reactions that occur inside cells to turn genes on and off etc.

The second, and in some ways more valuable, part was taking scientific papers that contained a mathematical model of some biological process and getting us to reproduce the results of the model, as well as extend the model in some way. Having to do this really forces you to understand where all the equations and graphs in a paper come from so you can reproduce them [by writing a bunch of Matlab code]. In that respect, it was a good way of instilling the skills required to analyze a model, figure out what assumptions it makes and decide for yourself whether you think it's a good model or not.

The main downside to this course was the amount of work required -- I spent an average of 15-20 hours a week on the homework for it.

7.81: "Systems Biology". Another model-building class, but what made this different is that it covered models at various scales, starting from modeling what occurs inside a single cell all the way up to how different cells 'talk' to each other in order to produce a whole animal. It also relied very heavily on analyzing these models via something called "linear stability analysis" -- basically, you write down a set of differential equations that you think the system obeys, which then allows you to analyze how the system behaves. For example, you can have three genes that interact in such a way that gene A turns off gene B, gene B turns off gene C and gene C turns off gene A. Depending on how quickly the genes turn each other off, you can get a continuous oscillation ie the amount of gene A goes up, goes down, goes back up, goes down etc. Linear stability analysis allows you to figure out how fast the reactions have to go for this to be the case. [For the detail-oriented: yes, I'm playing fast and loose here with the distinction between a gene and the protein it encodes. Deal.] Overall, a great class; even the problem sets were interesting, once you actually figured out what the hell the question meant =)

Historical side note from this class: Alan Turing, one of the fathers of modern computer science, actually did some
theoretical work on pattern formation in biological systems [eg the stripes on a zebra's skin], way back in the 1950s. That was one smart man ...

So, basically it's been a semester of model building -- lots of differential equations, both ordinary and partial, and finding mostly numerical solutions to them. I must admit that I'm a bit disappointed, in an "Is that all there is to these models ?" sort of way ... I'm not sure what I expected, but somehow this wasn't quite it. Haven't quite put my finger on what exactly is bothering me yet.


described elsewhere, I did one rotation in Drew Endy's lab, building some Biobrick parts. My current rotation is in Doug Lauffenburger's lab, where I'm working on applying Bayesian networks to analyzing some T-cell signaling data. Sounds fancy, but so far, it's pretty much consisted of figuring out how to get some existing code running and massaging the data from an Excel spreadsheet into a suitable XML format. In other words, pretty mundane computer stuff. Some of that is an artefact of the timing of the rotation -- it started a week and a half before Thanksgiving [just enough time to start ramping up], then there was Thanksgiving, after which came the last week of classes [ie last set of homework the professors needed to cram in], finals week and now the holiday season. In other words, I haven't exactly devoted a lot of time to it, or to figuring out how to make it more interesting. We'll see how far I get before my third rotation starts in the middle of January ...

Things I've figured out this semester:

1. I'm an engineer, not a scientist

This means that I want to actually build something, not just understand how it works. If, in order to build what I want to build, I have to figure out how something works, great, but that's not my main motivation. What this means is that a lot of the research going on in MIT's biology department leaves me rather cold, because it's very much "pure science" -- figure out how system X in organism Y works, in excruciating detail, with no obvious practical application other than the oft-invoked "... and this research may help us understand how the process works in humans, which will lead to treatments for [insert your favorite disease here]". I definitely understand that argument -- working on model organisms like E. coli, mice, fruit flies etc has been a great way to figure out how many biological systems work, but where things fall down for me is in not going beyond just understanding the system, and actually trying to manipulate it.

My statement that I like to build stuff may strike some of you who know about my lack of interest in really hands-on things like tinkering with engines etc as rather funny. I've thought about that apparent disparity a lot, and the conclusion that I've come to is that I like building abstract entities -- computer code is the perfect example. You can construct really complex artifacts, tinker with them and have the satisfaction of seeing your creations run, all without getting your hands dirty or requiring much hand-eye coordination =) More about this later on.

2. I'm not much of a [mathematical] theorist

I already kind of knew this based on my likes and dislikes in computer science. I was never very interested in algorithm design and coming up with things like the absolutely quickest sorting algorithm; instead, what's always interested me more has been "systems"-level stuff ie tinkering with operating systems and networks. Essentially, figure out the base-level algorithm needed for your system to not totally suck, build the system, measure it and then improve it based on what you've measured. That's in contrast to the approach of first coming up with a really sophisticated mathematical model that allows you to prove all kinds of cool things and only then trying to write code that actually implements your insanely-complicated algorithm.

From personal observation, I know what happens when you take the second approach: the software is pretty much impossible to get right, when it breaks only the person who wrote it can fix it and you keep discovering corner cases that your fancy algorithm doesn't handle very well. But the real killer is that as soon as your code encounters the real world, you find out that your system doesn't work as anticipated because the real world doesn't match all those nice simplifying assumptions you made. And then you're right back to where you would have been if you'd taken the first approach -- measuring what's going on and trying to fix your code accordingly, except that you're actually worse off: now you have to try to fix something really complicated.

In terms of biology, this realization was reinforced by my reaction to some of the papers I've had to read that contain mathematical models of biological processes. Basically, beyond a certain level of mathematical complexity, my mind just switched off and I skipped that section. Part of this is likely due to the fact that sometimes I didn't actually understand the math, but I also think that a lot of the theoretical models in papers go a bit too far -- at a certain point, it feels like they're just making stuff up so that their model fits the experimental evidence, with little physical evidence.

I suppose I'm a bit disillusioned about mathematical models of biological processes in general. When I started the semester, my stance was something along the lines of "Models should be as detailed as possible" and I thought that if you built in enough complexity, you could predict just about anything. Now, it's more like "Models should be as detailed as needed, and as allowed by experimental evidence ... and in the end you're going to have to do the experiment anyway, so don't overdo it". In other words:

- only try to model something at, say, the level of individual molecules if what you really want is knowledge at the molecular level, don't do it if what you're really interested in is something higher-level, like behavior of a whole tissue.
"Of Exactitude In Science" (thanks to Drew Endy for pointing this out) makes the point more poetically =)
- only put things into your model for which you have experimental evidence, don't just make up some terms so that your model produces pretty graphs which look like the experimental data
- ultimately, you're going to have to perform the experimental work to verify your model anyway, so don't go insane building in lots of fancy math

In thinking about the issue of mathematical models of biological processes, I've come across a couple of links that I haven't fully digested yet, but that I present here for anybody who is interested in thinking about this some more:

Eugene Wigner [a Physics Nobel laureate] wrote an essay on "The Unreasonable Effectiveness of Mathematics in the Natural Sciences", the gist of which is
a) beyond elementary arithmetic and geometry, math is just something that somebody made up; in other words, there's nothing in the real, physical world that corresponds to mathematical constructs like imaginary numbers, non-Euclidean geometry etc.
b)despite the fact that a lot of math is "invented", math is amazingly good at allowing us to describe and predict physical phenomena
c) what's up with that ?

- Apparently, Richard Feynman had some thoughts about the correspondence between models and nature as well, according to Werner Vogels' post about
Feynman & REST.

What's next ?

As I said earlier, this semester was mostly about learning how to build mathematical models. Next semester, my main focus is going to be what most people think of when they hear "bioinformatics" [to the extent that they think of anything ;-)]: analysing various sorts of biological data. I plan to take:

BE 490 "Foundations of Computational and Systems Biology": this deals with questions like "I have the sequence of a gene or protein, is it similar to any known genes/proteins ? If not, how different is it ?", "I know the sequence of a protein, can I predict into what 3-dimensional shape it will fold ?", "I have the equivalent of the pieces of a jigsaw puzzle of DNA fragments, how can I put them back together ?"

6.874 "Computational functional genomics": concentrates on analyzing genomic data via statistics, machine learning etc.

7.56 "Foundations of Cell Biology": the title says it all. My first actual biology course in 16 years ;-)

From a research perspective, I still think the notion of engineering biological systems to perform computation is the most interesting thing going on; this includes not only things like synthetic biology [on which there is another article in the Jan 2005 edition of Wired; you know stuff is cool when it's in Wired ;-)], but also other efforts that can be grouped under
"How to build a computer out of DNA" [see also the "International Meetings on DNA-based computers"]. I think these efforts have incredible long-term potential for radically changing our technological abilities. They also appeal to my "build something" instincts mentioned earlier -- it's basically the biological equivalent of writing code. In contrast, other research efforts to engineer biological systems, like the various tissue engineering efforts going on, are already a bit too hands-on/physical for me.

The main problem with this set of technologies is that they're still very, very early in their development; even though there are already some efforts that are close to being practically relevant [like Jay Keasling's work on
engineering bacteria to produce malaria drugs], I doubt the technology as a whole will become practically relevant in the next 10 years [although I'd be happy to be proven wrong]. This matters to me because I want to go into industry, not into academia, so I need to acquire a skillset that's relevant to industry. A quick-and-dirty survey of job postings looking for "computational biologists" reveals that most of the positions are for people who know about analysis of large biological data sets, basically the sort of stuff that will be covered in my courses next semester. I don't expect that that will really change over the next few years -- the glut of data is just going to get worse, so people who know how to extract something useful out of it will continue to be in demand. And just taking a course in this area doesn't exactly count as expertise -- you need to have actually used the techniques in your research to be able to claim that. So if I want to pursue a thesis topic centered around synthetic biology, I need to figure out a way to tie in some computation-intensive data analysis bits so I can say something more intelligent than "Well, I know how to spell 'microarray' ... " when a prospective employer asks me what I know about analyzing gene expression data. Still trying to figure that one out. [In other words, I'm still pretty much where I was 4 months ago. Somewhat depressing.]

and that's all I have to say about that.


Anonymous Anonymous said...

You know, "...I want to actually build something, not just understand how it works...," contains a dead horse of an assumption I've been beating for some years now.

Understanding, in the rather limited sense we biologists have deluded ourselves into imagining it, is superfluous and often misleading when you're engineering complex systems with emergent properties.

And also: told ya so. Years ago.-- Tozier

12:52 PM  
Blogger Alex said...

Ah, yes, the question of whether you can ever really "understand" a system with emergent properties. From an engineering perspective, I want to understand the system to the extent that I can get it to do what I want and know what the extremes of its behavior are. If I can't put those sorts of boundaries and numbers on its operation, it's not much use to somebody who wants to use it in the real world.

And, yes, you may have told me so, but that was, what, 12 years ago ? I was barely 18 then ;-)

1:01 AM  

Post a Comment

<< Home