Tuesday, November 22, 2005

Lessons from AI, maybe

Disclaimer: My knowledge of the history and practice of AI is pretty sketchy, so all that follows may be totally wrong.

In thinking about the current state of computational biology, specifically its application to the real world, it seems like there are certain parallels to be drawn with the development of artificial intelligence [AI].

Lots [all ?] of the first AI systems were based on formal/logical reasoning of the type "A is a bird; birds can fly; therefore A can fly" ie they relied on having very tightly specified knowledge, and rules about how to apply that knowledge, to allow them to make decisions/predictions. This approach ran into problems for a number of reasons.

For one, there are always exceptions to rules; for example, penguins are birds, but they can't fly. So any rule like the one above had to be modified to say something like "A is a bird; birds, except for penguins can fly; if A is not a penguin, A can fly". But that way lies madness -- there are lots of other flightless birds, and so you have to keep making the rule more and more complicated.

Another problem was the sheer amount of rules and knowledge required to reason about even the simplest things. For example, in order to make a rule about birds, you first have to specify what is and isn't a bird, and to know that a "house sparrow" is the same as "an animal of species Passer domesticus", and that a Kentucky Fried Chicken isn't actually a bird etc or else you won't be able to reason correctly in all cases.

The upshot of all this is that these formal system [also known as "Good Old-Fashioned AI"] were pretty much only usable for "toy problems", the kind that are easy to find in a research lab but non-existent in the real world, and led to some disenchantment with AI. [That hasn't stopped some people from continuing to pursue that approach, via the brute-force approach of building a database with lots and lots of rules ...].

Then, in the 1990's, statistical/machine learning approaches to AI started to become popular. These approaches don't have all their knowledge rigidly encoded; instead, they analyze past data to come up with more fuzzy notions like "What's the probability that X will happen, given what has happened in the past ?" eg "In the past, this coin came up heads 90% of the time; what are the chances that it'll come up heads again on the next toss ?". This approach more accurately reflects the messiness in the real world -- you can't know everything in advance and freak occurrences do happen, so the best thing to do is make [almost] no assumptions, learn from history, never say something can't happen and make your best guess based on what you've learned. Software systems using these sorts of algorithms have enjoyed some spectacular successes, like completion of the 2005 Darpa Grand Challenge, a contest to build a vehicle that could successfully navigate 175 miles of desert terrain on its own ie without a driver.

So, to summarize: in AI, formal/fully-specified systems: not so good, "informal"/statistically-based systems: pretty good, in terms of being able to handle the real world.

In computational biology, I think of mechanistic, differential-equation based models of biological processes as the the analogue of formal AI systems. You have to know all the interacting proteins, specify which ones interact and how strongly etc. What makes this difficult is that, in general, you really don't know all the proteins involved, you don't know all the interactions, you have only a few [not very accurate] measurements of what the reaction rates are etc. Basically, there's a whole bunch of stuff you don't know and so can't build into the model and each time something new is discovered, you have to go back and update your model. And measuring some of the stuff you'd need to refine your model is generally lots of drudgery and so nobody does it. The end result is that you can only build these sorts of models for small, very well-understood biological systems and even then the models aren't very good at capturing what's actually going on. In other words, you're restricted to mostly toy problems, which, while interesting, are unlikely to be useful to anybody who wants to do something like predict how cells will respond to a particular drug.

On the other side, there are statistical models of biological systems, which don't make any specific statements about whether protein X interacts with protein Y but rather examine data that's relatively easy to generate, make more general statements, like "When there's lots of protein X, there isn't much protein Y but lots of protein Z" and make those sorts of statements about lots of proteins at once. Because you're looking at so much more data, and aren't constrained by trying to figure out how all the pieces interact in detail, you end up with a much broader picture of what's going on in a cell. And, arguably, that broader picture is more useful when you have real-world applications in mind.

So, similar to what happened with AI, my guess is that mechanistic models in computational biology will remain an academia-only topic for the foreseeable future, whereas the more statistically-based models will be [and already are, I suppose] the most useful "real-world" ones for the next 10-20 years [and maybe forever, depending on how much more effort is expended on the more mechanistic models ...].

1 Comments:

Anonymous Tozier said...

Astute.

One extra step, though: Statistical models and machine learning aren't (typically) for building stuff, just predicting it and classifying it. The AI approach is for building stuff.

Only very, very few people in the learning world are willing to build stuff based purely on machine learning models, on autonomous stuff, without at least making a pass at proving how good it should be. Proving convergence of a heuristic, proving lower bounds on error, that sort of thing.

As a result, even statistical data-miningish models are considered a starting point, and from that point you revert back to the old planning-and-making-and-testing mode.

There's another way, but a scarce one. Something I used to call "escape from design", but which I'm now not sure is. Whatever it should be called, it will be the way people successfully engineer living systems, and it will be the way next-gen or next-next-gen computer systems operate as well.

But, in general, you're on the track :)

12:03 PM  

Post a Comment

<< Home