Thursday, May 19, 2005

First year recap: classes

Booyakasha ! My first year of grad school, at least the "taking classes" part, is over. This is, of course, cause for both celebration as well as some reflection [ie long, rambly posts that only family members will read, more out of obligation than anything else ;-)]. As I try to mentally summarize the past 9 months, stuff falls into 3 big buckets:

1) Classroom education -- what I learned from the classes I took
2) What I want to get out of my PhD overall, in terms of specific training and the area[s] of research I'm deeply interested in
3) The overall question of whether making this move was a Good Thing or not

We'll start with #1, because it's the easiest, most concrete topic, because #2 [my lab choice] isn't settled yet and because #3 depends on the first two.

As I described earlier, fall semester was devoted to building lots of detailed models of biological systems ie lots of computational stuff. This semester was evenly split between "straight" biology and computational biology. My molecular biology and cell biology classes, for all my complaining about them, were actually my favorite classes because I feel like they were the ones in which I started to learn how biologists think ie the classes taught me more than just facts. The other two [computational] classes were certainly not short on material covered, but they mostly expanded my repertoire of computational tools I can use, they didn't really show me a totally new way of looking at a problem.

So, what's this big insight I got from my biology classes ? In the end, it's embarrassingly trivial: biology is analog and physical, not digital. "Well, duh, no sh!t, Sherlock", you say. Allow me to expand.

When you're trained as a computer scientist, you're trained to solve problems using layers of abstraction. Suppose you're given a problem that you don't know offhand how to solve. Start out with some "black boxes" that magically do what you want, without necessarily knowing how that black box does its thing. Then, take each black box in turn and subdivide it into another set of smaller black boxes that work together to perform the overall function of the bigger black box, again without worrying exactly about how each smaller black box does what it does. Keep doing this and eventually you get to a list of small things that you -do- know how to do and now, when you combine all of them, they build on each other to solve the overall big problem that you started out with. For a computer scientist, getting to "something you know how to do" basically means that each sub-problem is small enough that you can write some fairly simple code to do it, and be reasonably sure that your code does the right thing.

Here's the point, though, that I never really thought about too much: even when you get to the point where you're writing code, you're still really, really far up in the abstraction hierarchy. Underneath the code you write, there's a boatload of other black boxes that all have to work together to make your code work: your operating system, the microcode controlling your computer's CPU, the transistors the CPU is made of, the logic gates each transistor is made of, etc all the way down to electrons flowing along tiny little wires. In other words, you're still very far removed from the actual physical reality of what needs to happen in order for your code to have the effect you'd like it to have.

What does all this have to do with biology ? It has to do with the way I used to think about biological processes, or rather, the way I didn't. Before this semester, when I read a sentence like "The cell senses DNA damage, activates protein A and protein A in turn activates proteins B and C, which move into the nucleus and cause production of proteins that can repair the DNA damage", I didn't really think much about it beyond what was said -- seemed like a reasonably straightforward process, what else is there to ask ? For most computer scientists, that sentence above is a perfectly good explanation of how the cell handles DNA damage. Biologists, however, start asking all kinds of uncomfortable questions, like: "The DNA [for E.coli] is about 4 million letters long, how exactly does it sense DNA damage ?", "How does protein A, which is just diffusing around in the cell, actually find proteins B and C, which are also just randomly floating around in the cell ?", "How do the repair proteins find the damaged DNA ?" In other words, they start asking about the physical and chemical details of how exactly something happens, which is so far down the abstraction hierarchy that computer scientists get vertigo from looking down.

This is what I mean when I say biology is analog and physical, not digital: when you're digital, you're sitting on top of a pile of abstractions, whereas biology occupies the [unfinished] basement, physical reality. In some ways, this actually makes biology easier to think about because you can construct simple mental models based on your knowledge of how physical objects interact. For example, you can visualize "protein A displaces protein B from being bound to protein C" as "ball A knocks ball B off ball C". And if you can't form a consistent mental picture of a biological process in physical terms, you probably don't really understand it or the proposed model of the process is wrong. There's nothing magical about it -- it has to obey the laws of physics [in contrast to the unfettered nature of computer code].

So, this semester served to refine my mental layer cake, and give me an even greater appreciation for the amazing machine that a cell is.
All it has to work with are totally "dumb" materials and physical concepts: molecules that float around & bump into one another, concentration gradients of chemicals, physical barriers like membranes etc. There's no "master controller" that magically knows what to do eg how to react to a virus invading the cell, or how to duplicate DNA so the cell can divide -- it all has to happen by itself, just by following the laws of physics and chemistry. That's a pretty humbling thought when you consider that for all the expressive power of modern computer programming languages and runtime systems ie lots of "smarts" not constrained by physical reality [to some extent], we're still nowhere near being able to construct a software system nearly as flexible, versatile and fine-tuned as a cell.

More concretely, when I see/read a description of a biological process, I now start thinking about questions like "Where does the energy for it come from ?", "What gives the interaction between protein A and protein B its specificity ie why doesn't protein B also interact with protein C ?", "How do you make sure protein A doesn't interact with protein B all the time, but only during a specific phase of the cell lifecycle ?". In other words, not just "What's the logic flow in this process ie what are the data inputs, what are the key decision points, what happens at each decision point, what are the outputs ?" but also "How is this process actually implemented physically, at the level of dumb molecules ?" That's what I mean when I say that I've started to learn how biologists think -- what questions they ask, why they're important and how they go about answering them.

In summary, at this point I feel like I have a pretty solid array of computational techniques I can work with, as well as a good foundation in basic biology. There are still a few gaps to fill [like taking a probability course so I don't have to try to appear inconspicuous each time the phrase "random variable" is mentioned for fear that somebody will ask me something about random variables], but my areas of ignorance are not the sucking voids they used to be. The next step is picking an actual research problem to work on, so I can start to apply all I've learned in the classroom, and figure out what actually doing science means, as opposed to just learning about it.

[No, I didn't forget the other two areas I said I'd been thinking about. Those will get covered in subsequent posts.]


Anonymous philipj said...

Hi Alex!

I'm also a grad student at the great nexus of the sciences that is physics/chemistry/biology, though I'm coming from the physics side of things. I've also just completed my first year of grad school, so a lot of the things you're thinking these days also seem to apply to me. :)

There doesn't seem to be much in the way of a biophysics/systems biology/etc presence on the web, so I will add a link to a friend's site, Biocurious, that you may enjoy.

Anyway, looking forward to the next couple of posts!


3:45 AM  
Blogger Alex said...

Hi Philip - Thanks for the pointer, will take a look at the site. And thanks for actually reading my rambles =)


12:58 PM  
Blogger Corey said...

I think your description is worthy of comment. You're right - as a software engineer, I'm well above the inner-workings of the hardware - and development environments and protocols for that matter.

You basically just said biology can't be plotted in a workflow diagram. I can't work without workflow. =) I continue to be baffled - check out the big brain on Alex!

3:40 PM  

Post a Comment

<< Home