Back in 2003, the National Research Council commissioned me to write a chapter about “systems biology” for a report they were doing on the relation between biology and information technology. Since the report, which eventually appeared as Catalyzing Inquiry at the Interface of Computing and Biology (2005), was radically reorganized after my assignment was done, and since my text wound up being scattered, I thought I would post the original version here. It’s a little dated, and none of the references are recent, but it still gives a pretty good overview of the issues. Enjoy…
Systems Biology
On 14 April 2003, not quite 50 years to the day after James Watson and Francis Crick first published the structure of the DNA double helix,[1] officials announced that the Human Genome Project was finished. [2] After thirteen years and $2.7 billion, the international effort had finally given us a virtually complete listing of the human genetic code: a sequence some 3 billion base-pairs long.[3] Along the way, moreover, scientists had begun to compile similar genetic sequences for a rapidly expanding list of other organisms, from bacteria to fruit flies to mice, thereby laying the foundations for a new science of comparative genomics.[4] They had begun to map out individual variations in the genetic code, thereby laying the foundations for a new practice of genomic medicine, in which physicians would be able to calibrate each individual patient’s disorder, and devise treatments for it, with molecular precision. And they had begun to open up a Pandora’s box of potentially explosive social issues, ranging from the possibility of genetic discrimination, to the role of genetics as a determinant of race, ethnicity, and human behavior. In short, as genome project director Francis Collins and his colleagues declared a few weeks later in the journal Nature, the completion was “a landmark event.”
But now, they added, as they outlined a research agenda for turning the visions into reality,[5] the real work begins.
After all, knowing the complete sequence of base pairs in the human genome is a bit like knowing the complete sequence of 1s and 0s that make up a computer program: by itself, that information doesn’t tell you anything whatsoever about what the program does, or how it’s organized into functional units such as subroutines. In the case of DNA, it’s true, biologists have devised algorithms that can go through the sequence and (sometimes) identify regions that comprise individual genes. And it’s also true that (some of) those individual genes encode the instructions for making protein molecules: the “nanomachines” that serve as enzymes, transporters, gateways, structural building blocks, and a myriad other roles in the cell. But the fact remains that few, if any, biological functions can be assigned to a single gene or a single protein. A cell’s metabolism, its response to chemical signals from the outside, its cycle of growth and cell division-all these functions and more are carried out and controlled by elaborate webs of interacting molecules. Indeed, in what has to be the most astonishing display of self-reference in nature, the protein products of the genome actually react back on the DNA (and with each other) to regulate their own creation. Understanding these networks-”systems [that] are far more complex than any problem that molecular biology, genetics or genomics has yet approached,” as Collins and his coauthors put it-is critical to realizing genomics’ promise.[6]
Taking on the challenge of “systems biology,” as it’s come to be called, promises to be a major opportunity for cross-fertilization between IT and biology. There are many reasons for that, as discussed below. But perhaps the most fundamental is the simple fact that biology and computer science are the only two disciplines that have at their core the concept of information.
Biology as an Information Science
If anything, having a full listing of all 3 billion base pairs in the human genetic code has only made the genome more mysterious. Roughly half of it consists of highly repetitive sequences that don’t seem to be encoding much of anything. And most of the rest consists of non-repetitive sequences that seem equally useless. Why these sequences are there remains an open question. But in the meantime, most or all of the biological action appears to be confined to the tiny the fraction that’s left over. It’s in this fraction that we find the two fundamental types of information in the genome: coding sequences, and regulatory sequences.
Coding Sequences: Molecular Blueprints
The first type features DNA in its classic role as a blueprint for nature’s nanomachines, the proteins. The sequences here are organized into functional units-genes-that follow the fundamental dogma described in every basic biology textbook: one gene, one protein. The encoding relies on a three-letter code, in which each triplet of bases picks out one of 20 molecular building blocks known as “amino acids.” The full sequence of triplets within a single gene thus determines a corresponding sequence of amino acids, which will eventually be linked together to make the protein like so many beads on a string.
Among the early surprises of the genome project was that these blueprint sequences comprise no more than 1-2% of human DNA.[7] Another surprise was that the human sequences encode no more than 30-40,000 different proteins in toto-an almost humiliatingly small number, considering that the simple nematode worm C. elegans has roughly 20,000. In any case, only about a third to a half of the proteins in the genome are actually manufactured in any given cell type (a muscle cell, say, or a liver cell); the genes that specify the rest are suppressed by the regulatory apparatus discussed below.
Also included under the DNA-as-blueprint category are a few thousand genes that encode various types of RNA molecules. RNA is a kind of single-strand version of DNA, right down to the information-encoding bases. It plays a variety of roles in the cell, the most famous being in the protein synthesis process that is also described in every biology textbook. In outline, that process begins with transcription, in which the content of a given gene is copied from the DNA to a strand of RNA. This “messenger” RNA, or mRNA, then moves out into the cell cytoplasm, where it encounters an RNA-based structure known as the ribosome.[8] The ribosome grabs onto one end of the mRNA and begins to move along its length like the read-write head of a videotape player scanning the tape. As it goes, it carries out the translation portion of protein synthesis, reading each triplet code in turn and creating a corresponding chain of amino acids: the growing protein molecule. (Each amino acid is brought to the ribosome by a special “transfer” RNA.) When the ribosome reaches the end of the mRNA, the protein is complete.
Regulatory Sequences: Biological Processes and the Cellular Operating System
In addition to the 1-2% of the genome that contains coding sequences, there is a roughly equal percentage that appears to be under considerable selection pressure.[9] That is, similar sequences are found in the genomes of mice and other organisms, suggesting that these particular stretches of DNA are too critical to our survival to change very rapidly over the course of evolution. Relatively little is know about these sequences for sure, although some of them are known to be involved in the basic mechanics of the chromosomes themselves. (Examples include the highly repetitive DNA in the “telemeres,” which are structures that cap off the ends of the chromosomes, and the special DNA sites that tether the chromosomes to the membrane of the cell’s nucleus.) Most likely, however, these sequences contain the bulk of the cell’s regulatory information.
Although there is much still to learn about DNA regulation,[10] the research done to date suggests that a protein- or RNA-encoding gene will typically be controlled by several short stretches of DNA.[11] Often these sites are located just outside the coding region of the gene, near the beginning, but in principle they could be anywhere. Under the right circumstances, a specialized protein will come in and bind to each regulator sequence, latching right onto the DNA. The presence that protein will either encourage the cell to start “expressing” the gene-transcribing it into mRNA-or block the cell from doing so. The proteins are accordingly known as “transcription factors,” while the binding sites are known as “promoters” or “suppressers,” respectively. The resulting push-pull system allows the cell to shift the balance between expression and non-expression of the gene with exquisite precision, depending on which transcription factors bind, and when.[12]
The transcription factors, in turn, are part of (and are controlled by) the vast web of molecular interactions that comprise a kind of operating system for the cell. In purely biochemical terms, it’s true, the details of the reactions can be exceedingly intricate. A protein might interact with an enzyme that adorns it a phosphate group, for example, or a sugar molecule, or any of a variety of other appendages, which will then change its shape and activity level. Or several proteins might come together to form a multiprotein complex, and so on. Nonetheless, at a more abstract level, many of these molecular participants can be thought of as carrying information from one reaction to the next,[13] in somewhat the same way that data structures carry information from one computational process to the next. Just as various software modules are specialized for different tasks, moreover, the web of cellular interactions can be seen as a multitude of specialized sub-networks. Signal transduction pathways, for example, are the cascades of reactions that get triggered when the cell encounters some stimulus from the outside.[14] A metabolic network is the web of reactions by which a cell processes food molecules. The cell cycle is the network of reactions and genetic regulation that controls when and how a cell divides. Specialized or not, however, these sub-networks and many others overlap strongly, funneling information back and forth to one another, and to the genetic regulatory apparatus, thus allowing the cell to maintain itself, sense the outside world, and respond like the living thing that it is.
Challenges and Opportunities
The broad outlines of this picture were first sketched more than 40 years ago, as scientists in the late 1950s and early 1960s worked out the fundamentals of the genetic code, protein synthesis, and genomic regulation.[15] Indeed, their findings even inspired a brief vogue for “systems biology,” which in those days meant mathematical models and computer simulations based on such then-fashionable ideas as cybernetics and General Systems Theory.[16] That initial burst of enthusiasm waned fairly quickly, as it became clear that there wasn’t enough data to keep the mathematical abstractions tethered to experiment. But by the turn of the millennium, enthusiasm for a new and more modern form of systems biology was running strong again. Not only were we now in an age of abundant data, thanks in large part to the Human Genome Project, but more and more biologists were embracing the systems approach as one of the inevitable next steps for the post-genome era.[17]
The challenge, in a nutshell, is to understand the cellular information processing system-all of it-from the genome on up. Some of the key issues:
- What is the complete inventory of proteins in any given cell (a subfield often known as “proteomics”[18])? How do these individual protein molecules organize themselves into functional sub-networks-and how do these sub-networks then organize themselves into higher- and higher-level networks?[19] What are the functional design principles of these systems? And how, precisely, do the products of the genome react back on the genome to control their own creation?
- How do these dynamically self-organizing networks vary over the course of the cell cycle, and as the cell responds to its surroundings? How do they encode and process information? And what accounts for life’s robustness-the ability of these networks to adapt, maintain themselves, and recover from a wide variety of environmental insults?[20]
- How do the networks organize and reorganize themselves over the course of embryonic development, as each cell decides whether its progeny are going to become skin, muscle, brain, or whatever?[21] Then, once the cells are done differentiating, how do the networks actually vary from one cell type to the next? What constitutes the difference? And what happens to the networks as cells age, or are damaged? How do flaws in the networks manifest themselves as maladies such as cancer?
- How do the networks vary between individuals? How do those variations account for differences in morphology and behavior? And-especially in humans-how do those variations account for individual differences in the response to drugs and other therapies?
- How do the networks vary between species? Or to put it another way, how have they changed over the course of evolution? Since the “blueprint” genes for proteins and RNA seem to be quite highly conserved from one species to the next, is it possible that most of evolution is the result of rearrangements in the genetic regulatory system?[22]
To call this challenge “immense” would be an understatement; a full accounting of the cellular regulatory networks in every cell type, in multiple species, and over all time-scales, would dwarf the Human Genome Project by many orders of magnitude. Nonetheless, scientists are already organizing themselves to tackle the problem-or at least, significant pieces of it. Among the major initiatives are the Alliance for Cellular Signaling,[23] a university-industry consortium organized by Nobel laureate Alfred Gilman of the University of Texas, Southwestern; the Institute for Systems Biology,[24] a not-for-profit research foundation created in Seattle by Leroy Hood, a pioneer of rapid genome sequencing technology; the Caltech-ERATO-Kitano Systems Biology Workbench Project, a U.S.-Japanese collaboration devoted to computer modeling of biological systems; the U.S. Department of Energy’s Genomes to Life Program,[25] which focuses on identifying the proteins and characterizing the gene regulatory networks in microbial communities, with an eye towards energy production, global change mitigation, and environmental cleanup; and the National Cancer Institute Director’s Challenge: Toward a Molecular Classification of Cancer.[26]
This listing could be extended almost indefinitely; indeed, there seem to be very few university departments, biotech firms, or pharmaceutical companies that haven’t made at least some sort of investment in systems biology. But one common thread in all these initiatives is the critical importance of information technology. Indeed, biologists and information technologists have already created a flourishing cross-discipline known as bioinformatics, which encompasses a wide variety of techniques for archiving biological information and then “mining” it to reveal hidden patterns. Today, for example, it’s routine for biologists to run genomic data through gene-finding software to identify coding sequences-and from there, pipe the results into search programs such as BLAST and HMMer, which go through archives of previously annotated data to find proteins or protein families with similar sequences. With that information, in turn, they can often predict the function of their newly identified proteins.[27]
And yet, as powerful as such capabilities are, our current generation of bioinformatics tools are only the beginning. “The heterogeneity, complexity, and dynamic nature of [the data in systems biology] present computer science demands unlike those of any scientific domain before,” noted the organizers of a recent Department of Energy workshop[28] on the systems biology-IT interface. Some of the key challenges:
Decoding the Genome
In the Human Genome Institute’s recently published agenda for research in the post-genome era, Francis Collins and his co-authors repeatedly emphasized how little biologists understand about the data they’ve already got. They are a very long way from knowing everything there is to know about how genes are structured and regulated, for example-and they are virtually without a clue as to what’s going on in the other, non-coding 95% of the genome. That’s why the agenda’s very first Grand Challenge was to systematically endow that data with meaning-that is, to “comprehensively identify the structural and functional components encoded in the human genome.”[29]
The effort to meet this grand challenge may very well produce some scientific surprises, along the lines of, say, “non-coding” sequences that confer biological function in some new and unexpected way. But that effort will definitely produce a demand for major improvements in data-handling technology. Ideally, for example, the data technology for systems biology should be-
- Scalable. The first complete sequencing of the human genome took thirteen years and $2.7 billion. But along the way, that effort gave an enormous boost to the technology of automated gene sequencing. Today, advanced sequencers can analyze DNA at the rate of some 1.5 million base pairs per day; soon, if development goes the way researchers hope, it may be possible to sequence any given individual’s entire 3 billion-base-pair genome within 24 hours, for a cost of a few thousand dollars.[30] The same technologies will be applicable to other species, as well. The result promises to be archives of genomic data that will grow even more explosively than it is already. And that, in turn, implies that the architecture, algorithms, and hardware of the genome archives will have to be scalable-meaning that system will still be able to store and retrieve information efficiently no matter how large those archives get. In particular, it would be very helpful to have algorithms that could do a better and more accurate job of identifying genes and regulatory regions, with less need for humans to proofread the results.[31] Such algorithms might be based on advance pattern recognition techniques, for example, or sophisticated heuristic reasoning.
- Extensible. Today, when biologists archive a newly discovered gene sequence in, say, GenBank, they have various types of annotation software at their disposal to link it with explanatory data, such as how and by whom the sequence was identified, as well as the function of the protein or RNA it encodes. But next-generation annotation systems will have to do this for many other genome features, such as transcription factor binding sites and single nucleotide polymorphisms (SNPs), that most of today’s systems don’t cover at all. Indeed, these systems will have to be able to create, annotate, and archive models of entire metabolic, signaling, and genetic pathways. At the same time, moreover, they will have to deal with entirely new kinds of data. In recent years, for example, there has been a widespread deployment of DNA microarrays, which can assay any given type of cell and measure the activity level of hundreds or thousands of genes at once: are they or are they not being expressed, and by how much?[32] Even more recently, there has been a parallel deployment of protein microarrays, which can identify protein-protein (and protein-drug) interactions among some 10,000 proteins at once.[33] And of course, there are many promising technologies still in the laboratories.[34] The upshot is that next-generation annotation systems will have to be built in a highly modular and open fashion, so that they can accommodate new capabilities and new data types without anyone’s having to rewrite the basic code.
- Distributed. Given the scope of systems biology, the number of researchers in the field, and the variety of experimental tools being deployed, it seems highly unlikely all the relevant information will ever be gathered together in one giant data warehouse. Systems biologists will almost inevitably be confronted with archives that are stored in many different locations, in many different formats, and with many different owners. Already, for example, there is a certain inconsistency among genome annotations, simply because biologists have no standardized vocabulary for expressing the relationships. A group known as the Genome Ontology Consortium[35] is developing such a vocabulary. But even if the consortium’s work is universally accepted, there will still be vast swaths of legacy annotations that don’t conform. In any case, the trick is to devise database technology that can access these archives anyway, while hiding the complexities and inconsistencies, and making it seem to the user as if all of the archives were in a single warehouse.
- Visualizable. Biological processes can take place over a vast array of spatial scales, from the nano-scape inhabited by individual molecules, to our everyday, meter-sized human world. They can take place over an even vaster range of time scales, from the nanosecond gyrations of a folding protein molecule to the seven (or so)-decade span of a human life-and far beyond, if we include evolutionary time. And they can be considered at many levels of organization, from the straightforward realm of chemical interaction to the abstract realm of, say, signal transduction and information processing. Yet systems biology has to deal with these processes at every level and at every scale. Thus the need for cutting-edge information visualization systems. Such a system would offer vivid and easily understood visual metaphors to display the information at each level, showing just the right amount of detail. (Such a display would be analogous to, say, a circuit diagram, with its widely recognized icons for diodes, transistors, and other such components.) The system would likewise offer easy and intuitive ways to navigate between levels, so that the user could drill down to get more detail, or pop up to higher abstractions as needed. And it would offer good ways to visualize the dynamical behavior of the system over time-whatever the appropriate time scale might be. Current-generation visualization systems such as BioSPICE[36] and Cytoscape[37] are a good beginning. But, as their developers themselves are the first to admit, only a beginning.
Of course, none of these issues are unique to biology. Scalable, extensible, distributed database architectures are critically important in the corporate sector, as well-as are information visualization capabilities-and the computer industry has put a lot of effort into providing them. To cope with the multiple-archives problem, for example, IBM has developed a “federated” architecture, which does indeed allow users to submit queries and receive answers without worrying about exactly where (and in which format) the data resides.[38] To provide for modular, open software, meanwhile, the industry has recently begun to coalesce around the idea of web services, or the closely related grid protocols.[39] And, of course, developers can draw on at least two decades of research on information visualization,[40] not to mention their own extensive experience with visual programming environments, which allow them to view code at multiple levels of abstraction.[41] Nonetheless, adapting these technologies to the needs of biologists-and implementing them on the scale required by systems biology-will be a continuing and non-trivial challenge.
Understanding the Proteome
The central dogma of molecular genetics-the classic progression of gene to mRNA to protein-would seem to suggest that a roughly one-to-one correspondence exists between the genome of a cell and its “proteome”: the overall collection of proteins it contains. In fact, the proteome is vastly more complex than the genome. For one thing, a single gene can sometimes produce many proteins. In eukaryotes, for example, mRNA can’t be used as a blueprint until special enzymes first cut out the introns, or non-coding regions, and splice together the exons, the fragments that contain useful code. But in some cases, the cell can splice the exons in different ways, producing a series of proteins with various pieces added or subtracted. Or the cell’s translation machinery might introduce an even more radical change by shifting its “reading frame,” meaning that it starts to read the three-base-pair genetic code at a point displaced by one or two base pairs from the original. The result will be an utterly different sequence of amino acids, and thus, an utterly different protein. Furthermore, even after the proteins are manufactured at the ribosome, they undergo quite a lot of post-processing as they enter into the various regulatory networks. Some might have their shapes and activity levels altered by the attachment of a phosphate group, for example, or a sugar molecule, or any of a variety of other appendages, while others might come together to form a multi-protein structure.
If nothing else, this complexity implies a massive escalation of the database challenge discussed in the previous section: a typical human cell has 30,000 to 40,000 genes, but at least 300,000 different proteins, all of which have to be tracked and accounted for. However, the proteome also poses an entirely new computational challenge, which is to determine the structure of the regulatory networks using the available data.
Those data can be obtained in a variety of ways. In the two-hybrid technique, for example, a cell is genetically manipulated so that the protein-protein interaction of interest will cause a marker gene to be expressed; if the gene product is detected, then the interaction has presumably taken place, and vice versa.[42] First developed in 1989 to test for interactions between two proteins at a time, the two-hybrid approach has recently been employed to search for interactions en masse.[43] Another technique is co-immunoprecipitation,[44] in which a molecular tag is attached to the protein of interest, and the cell is then treated with an antibody to that tag. The antibody binds to the tag, precipitates out, and pulls down the protein along with anything bound to it; the bound species can then be identified via standard techniques such as mass spectroscopy. Yet another approach is to perturb the cell in some fashion-say, by deleting a specific gene-and then use DNA microarrays to determine how the genome responds.[45] The genes that show a significant increase or decrease in expression rates will presumably have protein products that belong to the same regulatory pathway as that of the deleted gene.
The result, in every case, is a list of protein-protein and/or protein-DNA interactions. Unfortunately, having such a list is not the same as having a good model of the regulatory pathway itself. Data from both the two-hybrid technique and co-immunoprecipitation tend to be noisy, for example, meaning that they contain a substantial fraction of false positives and false negatives; one big reason is that they are looking at proteins that have been chemically modified with tags and such, which can potentially change the proteins’ behavior. Meanwhile, data from the microarray techniques give the “nodes” of a pathway-that is, the proteins being expressed-but they have little or nothing to say about the “wires”: the interactions among those proteins. Indeed, the same microarray data can usually be accounted for by any number of networks.[46]
For all of the difficulties, researchers have been able to work out quite a few networks anyway.[47] Still, there’s a vast amount left to learn, and plenty of room for better computational tools. Ideally, for example, the systems biologist’s suite of analytical software would offer easy-to-use algorithms that integrated all the different forms of data and produced a most-likely guess as to the structure of the networks, with each reconstructed link assigned a confidence level. It would also offer algorithms that helped researchers plan new experiments-say, by suggesting which perturbations might do the most to resolve the ambiguities in the microarray data.[48] And it would offer algorithms that allowed them to compare networks across different cell types and different species, in much the same way that BLAST now allows them to find homologous sequences in different genomes.
That last capability could be an extremely powerful one, if experience with comparative genomics is any guide. Of course, such “comparative proteomics” won’t reach its full potential until researchers have accumulated a lot more network data than they have now. Nor will it go anywhere until they’ve solved some key theoretical and computational challenges. After all, two sub-networks may contain homologous proteins, but do different things. Or they may have utterly different proteins, yet carry out virtually the same function. Or they may have similar proteins and similar overall functions, but a different interaction structure. So precisely what does it mean to say that one (piece of a) regulatory network is homologous to another?
Still, it should eventually be possible to look across species and understand how regulatory networks have changed through evolution. It may likewise be possible to identify modular sub-circuits that nature has used again and again-or conversely, to find pathways in bacteria, say, that are not shared by humans, and that are therefore attractive targets for new pharmaceuticals.
Modeling the Networks
Ultimately, of course, data yields insight only when it’s been codified into theory-which, in the case of cellular regulatory networks, means computer simulation. Indeed, since paper-and-pencil calculations are pretty much hopeless in systems of this complexity, a good computer model is the only feasible way to see if a tentative reconstruction behaves like the real network. And that, in the classic, experiment-theory-experiment cycle of the scientific method, is a critical step toward better reconstructions. As the simulations improve, moreover, they could provide a foundation for what some have called “cellular engineering”-a discipline in which practitioners could predict, control and design cellular networks as confidently as traditional engineers create, say, a new aircraft.[49] Certainly the simulations will be exceptionally useful for assessing and predicting drug actions,[50] and especially, drug interactions. (It’s a rare pharamaceutical that binds to just one cell-surface receptor, and triggers just one signaling network; thus the ubiquity of side effects.[51])
Not surprisingly, cell-network researchers have developed any number of simulation development packages already; examples include BioSpice, DBSolve, E-Cell, VCell, Gepasi, StochSim, and Caltech ERATO.[52] Nonetheless, this is still very much a field in flux-even (or especially) when it comes to such fundamental questions as ontology: how do we go about understanding the cellular networks? What kind of conceptual framework will best help us make sense of how they work, and what they are doing? And precisely what is the most effective way to represent the networks in a computer?[53]
There are almost as many answers to those questions as there are researchers in the field-not least because the “right” answer so often depends on the phenomenon they are looking at, and on critical factors such as time scale. Take metabolic networks and signal transduction pathways, for example, which can respond to environmental changes considerably faster than the genome itself can. (They operate on a physiological time scale of milliseconds to a minute or so, whereas transcription and translation take a minute or longer.[54]) In 1995, writing in the journal Nature, Dennis Bray of Cambridge University forcefully made the case for an information-processing view of these pathways: “Many proteins in living cells appear to have as their primary function the transfer and processing of information, rather than the chemical transformation of metabolic intermediates or the building of cellular structures,” he declared.[55] In particular, Bray argued, a simple enzyme protein could be viewed as a computational element that takes an input-the concentration of its “substrate,” the molecule it interacts with-and produces an output: the catalyzed reaction product. Likewise, an enzyme that becomes active only when it binds with two separate regulator molecules will function something like a Boolean AND gate,[56] and so on. Just as in an electrical engineering lab, moreover, circuits formed from these elements can be as simple as a switch or an oscillator, or as complex as a bacterium’s chemotaxis[57] response. Indeed, the cell even possesses a kind of short-term, “random-access” memory, in the sense that events in its environment have profoundly shaped the concentration and activity of many thousands molecules in the cell. In short, Bray concluded, these protein-based circuits comprise a kind of nervous system for the cell, providing it with much of what it needs to control its behavior.[58]
Of course, that left the other, slower half of the cellular control system: the genetic regulatory networks that govern responses on a time-scale of minutes or longer. As it happened, however, in a paper[59] that was published at almost exactly the same time as Bray’s, Stanford University’s Harley McAdams and Lucy Shapiro showed that genetic networks could also be modeled via the electrical circuit analogy.[60] Indeed, McAdams and Shapiro not only tackled the complexities of an actual regulatory network-the decision circuit that governs the course of a ?-phage infection in E. coli-but they gave careful consideration to such real-world factors as time delays, which are critical in biological networks (gene transcription and translation are not instantaneous, for example) and indeed, in electrical networks, as well.[61] Along the way, and in later work with colleagues such as Adam Arkin, now at the University of California, Berkeley, they clarified some of the ways in which regulatory networks are not like electrical circuits. Because critical molecules are often present in the cell in extremely small quantities, to take the most notable example, certain critical reactions are subject to large statistical fluctuations, meaning that they proceed in fits and starts, much more erratically than their electrical counterparts.[62] Nonetheless, as McAdams and Arkin emphasized in a review paper a few years later, so long as the differences are kept clearly in mind, the cell circuit-electrical circuit analogy can be a deep and powerful one. Indeed, they wrote, nature’s designs for the cellular circuitry seems to draw on any number of techniques that are very familiar from engineering: “the biochemical logic in genetic regulatory circuits provides real-time regulatory control [via positive and negative feedback loops], implements a branching decision logic, and executes stored programs [in the DNA] that guide cellular differentiation extending over many cell generations.”[63]
At still longer time-scales of, say, hours, one finds comparatively slow-moving processes such as the cell cycle. In this regime, modelers can safely describe the cell’s dynamics with a straightforward series of chemical rate equations-that is, equations in which nothing matters but the concentration of each chemical species at any given moment. All the complications due to time delays in gene expression, statistical fluctuations, membrane transport, and the like have simply gone away; they happen so rapidly on this time scale that they are effectively instantaneous. [64] Of course, the equations for any real regulatory network are still dauntingly complex, and can only be solved by computer. But even so, much of their behavior can be understood via the tools of non-linear systems dynamics, often referred to as “chaos theory.” For example, a stable “point attractor” in the equations-that is, a solution in which the variables don’t change with time-might correspond to a cell that was at a stable “checkpoint” of its cycle: a kind of waiting state brought on by factors such as DNA damage, or a lack of nutrients. Likewise, a “bifurcation” in the equations, in which the systems suddenly changes from, say, a point attractor to a periodic oscillation, might correspond to an egg cell that’s been fertilized, and must now start to go through cycle after cycle of growth and division. Indeed, such dynamical systems have now been implemented in dozens of biological simulations.[65]
And so it goes: ontology-and modeling-for systems biology have made encouraging progress. But there are still a great many challenges.[66] To mention just a few:
- Building in spatial structure. The cytoplasm isn’t just a uniform mixture of all the biomolecules that exist in a cell; proteins and other macromolecules are often bound to membranes, or are isolated inside of various cellular compartments (especially in eukaryotes.) A full account of the regulatory networks has to take this compartmentalization into account, along with such spatial factors as diffusion, and the transport of various species through the cytoplasm and across membranes.
- Modeling multicellular biology. Although it’s certainly possible to model bacteria and single-celled eukaryotes can be modeled as more or less isolated entities, a full account of multicellular creatures such as humans will have to include an account of intercellular signaling, cellular differentiation, cell motility, tissue architecture, and many other “community” issues.
- Interoperability. Despite the developers’ best efforts, none of the simulation packages today offers everything a systems biologist might need. Nor is any of them likely to do so in the foreseeable future; covering the entire range of size- and time-scales requires so many simulation techniques that no one package can hope to offer the best-of-breed tools in everything. And in any case, the field itself is evolving far too rapidly for any single package to keep up. So a better solution is to get the various packages working together.[67]
This is trickier than it sounds. Ideally, for example, it would mean packages that took advantage of a cluster of emerging technologies known variously as web services,[68] grid protocols,[69] and peer-to-peer computing.[70] Among other things, these technologies would allow for simulations to be run across the Internet on dozens, hundreds, or even thousands of machines in parallel, thus allowing researchers to bring vast computational power to bear. And they likewise allow for networked simulations to be assembled on the fly from self-contained software modules, which could be mixed and matched by other systems as needed.[71]
In the meantime, however, an equally important goal is to develop an easy way for the models (and the modelers) to share and communicate their results. And indeed, a consortium of leading developers led by the Caltech-ERATO Kitano group has taken a significant step in that direction by developing the Systems Biology Modeling Language: an open, extensible representation scheme, based on XML, that gives developers a common format for describing their models. SBML, in turn, provides a foundation for the Systems Biology Workbench: a software framework that allows interaction among models created by different groups-even if the models are written in different programming languages and running on different machines.[72] A parallel effort is the Physiome Project,[73] which was launched in 2001 by the International Union of Physiological Sciences, and which is headquartered at the University of Aukland, New Zealand. In addition to offering its own modeling tools, the Physiome Project has developed a number of representation languages for higher-level biological systems, including CellML, Cell Modeling Language, and AnatML, Anatomy Markup Language.
Reference List
“Caltech ERATO Kitano Systems Biology Workbench Development Group.” Web page available at http://www.sbw-sbml.org/index.html.
“Gene Ontology Consortium.” Web page available at http://www.geneontology.org/.
“The Institute for Systems Biology.” Web page available at http://www.systemsbiology.org/.
Bray, Dennis. 1995. “Protein molecules as computational elements in living cells,” Nature 376:307-12.
Caltech-ERATO-Kitano Systems Biology Workbench Development Group. “Repository.” Web page available at http://www.sbw-sbml.org/repository.html.
Cohen, Jon. “The Proteomics Payoff,” Technology Review, Oct 2001. p. 55-60.
Collins, Francis S., et al. 2003. “A vision for the future of genomic research,” Nature 422:835-47.
Couzin, Jennifer. 2002. “BREAKTHROUGH OF THE YEAR: Small RNAs Make Big Splash,” Science 298(5602):2296. Available online at http://www.sciencemag.org.
Csete, Marie E., and John C. Doyle. 2002. “Reverse Engineering of Biological Complexity,” Science 295(5560):1664. Available online at http://www.sciencemag.org/cgi/content/abstract/295/5560/1664.
Davidson, Eric H., et al. 2002. “A Genomic Regulatory Network for Development,” Science 295(5560):1669. Available online at http://www.sciencemag.org/cgi/content/abstract/295/5560/1669.
DeFrancesco, Laura. 2002 . “Probing Protein Interactions,” The Scientist 16(8):20.
DeFrancesco, Laura, and Deborah Wilkinson. 1999. “The Two-Body Problem,” The Scientist 13(8):21. Available online at http://www.the-scientist.com/yr1999/apr/profile2_990412.html.
Frazier, Marvin E., et al. 2003. “Realizing the Potential of the Genome Revolution: The Genomes to Life Program,” Science 300(5617):290. Available online at http://www.sciencemag.org/cgi/content/abstract/300/5617/290.
Gannis, Mike. “Alliance for Cellular Signaling Decodes Complex Messages of Cells,” NPACI & SDSC Online, 28 Nov 2001. Available online at http://www.npaci.edu/online/v5.24/cell.sig.html.
Gwynne, Peter, and Guy Page. 1999. “Microarray Analysis: The Next Revolution in Molecular Biology,” Science 285. Available online at http://www.sciencemag.org/feature/e-market/benchtop/micro.shl.
Hartwell, Leland H., et al. 1999. “From Molecular to Modular Cell Biology,” Nature 402:C47-52.
Hood, Leroy, and David Galas. 2003. “The digital code of DNA,” Nature 421:444-8.
Hunter, Philip. “Putting Humpty Dumpty Back Together Again,” The Scientist, 24 Feb 2003.
Ideker, Trey, et al. 2001. “A New Approach to Decoding Life: Systems Biology,” Annu. Rev. Genomics Hum. Genet. 2:343-72.
Ideker, Trey, et al. 2001. “Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network,” Science 292(5518):929.
The Institute for Systems Biology, The Whitehead Institute, and The Memorial Sloan-Kettering Cancer Center. “Cytoscape.” Web page available at http://www.cytoscape.org/.
The International Human Genome Sequencing Consortium. 2001. “Initial sequencing and analysis of the human genome,” Nature 409:860-921.
Jacob, François, and Jacques Monod. 1962. “On the regulation of gene activity,” pp. 193-209 in Symposium on cellular regulatory mechanisms Cold Spring Harbor, NY: Cold Spring Harbor Laboratory.
Kitano, Hiroaki. 2002. “Systems Biology: A Brief Overview,” Science 295(5560):1662. Available online at http://www.sciencemag.org/cgi/content/abstract/295/5560/1662.
Lawrence Berkeley Laboratory. “BioSpice: Open-Source Biology.” Web page available at http://biospice.lbl.gov/home.html.
MacBeath, Gavin, and Stuart L. Schreiber. 2000. “Printing Proteins as Microarrays for High-Throughput Function Determination,” Science 289(5485):1760.
Maher, Brendan A. “The People’s Biology: Cellular signaling alliance puts a socialist spin on systems biology,” The Scientist, 24 Feb 2003. p. 22. Available online at http://www.the-scientist.com/yr2003/feb/feature1_030224.html.
McAdams, Harley H., and Adam Arkin. 1998. “Simulation of Prokaryotic Genetic Circuits,” Annu. Rev. Biophys. Biomol. Struct. 27:199-224.
McAdams, Harley H., and Lucy Shapiro. 1995. “Circuit Simulation of Genetic Networks,” Science 269:650-656.
Miller, Karl. “Metabolic Pathways of Biochemistry.” Web page available at http://www.hfni.gsehd.gwu.edu/~mpb/.
The Mouse Genome Sequencing Consortium. 2002. “Initial sequencing and comparative analysis of the mouse genome,” Nature 420: 520-562.
National Cancer Institute. “NCI Director’s Challenge: Toward a Molecular Classification of Cancer.” Web page available at http://dc.nci.nih.gov/.
National Human Genome Research Institute. “Comparative Genomics.” Web page available at http://www.genome.gov/11006946.
National Human Genome Research Institute. “The ENCODE Project: ENCyclopedia Of DNA Elements.” Web page available at http://www.genome.gov/10005107.
National Human Genome Research Institute. 2002. “DNA Microarray Technology.” Web page available at http://www.genome.gov/10000533.
National Human Genome Research Institute. 2003. “International Consortium Completes Human Genome Project.” Web page available at http://www.genome.gov/11006929.
Noble, Denis. 2002. “Modeling the Heart–from Genes to Cells to the Whole Organ,” Science 295(5560):1678. Available online at http://www.sciencemag.org/cgi/content/abstract/295/5560/1678.
Shi, Leming. 2002. “DNA Microarray (Genome Chip)–Monitoring the Genome on a Chip.” Web page available at http://www.gene-chips.com/.
Smith, Lloyd M., et al. 1986. “Fluorescence detection in automated DNA sequence analysis,” Nature 321:674-79.
Taubes, Gary. “The Virtual Cell,” Technology Review, Apr 2002. p. 63-70.
Travis, John. “Biological Dark Matter,” Science News, 12 Jan 2002. Available online at http://www.sciencenews.org/20020112/bob9.asp.
Tyson, John J., et al. 2001. “Network Dynamics and Cell Physiology,” Nat Rev Mol Cell Biol 2(12):908-16.
U.S. Department of Energy. “Genomes to Life: Biological Solutions for Energy Challenges.” Web page available at http://doegenomestolife.org/.
U.S. Department of Energy. 2003. “Report on the Computer Science Workshop for the Genomes to Life Program (March 6-7, 2002).” Web page available at http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf.
University of Aukland Bioengineering Institute. “The IUPS Physiome Project.” Web page available at http://www.bioeng.auckland.ac.nz/physiome/physiome.php.
von Bertalanffy, Ludwig. 1969. General Systems Theory: Foundations, Development, Applications. New York: George Braziller.
Waldrop, M. Mitchell. 2002. “Grid Computing,” Technology Review 105(4):31-37.
Watson, James D., and Francis H. C. Crick. 1953. “Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid,” Nature 171:737.
Webopedia. “Grid Computing.” Web page available at http://www.webopedia.com/TERM/g/grid_computing.html.
Webopedia. “Peer-to-Peer Architecture.” Web page available at http://www.webopedia.com/TERM/p/peer_to_peer_architecture.html.
Webopedia. “Web Services.” Web page available at http://www.webopedia.com/TERM/W/Web_services.html.
Whitehead Institute. 2001. “Scientists Find New Class of Genes Implicated in Protein Regulation.” Web page available at http://www.wi.mit.edu/nap/2001/nap_press_01_dbmrna.html.
Wiener, Norbert. 1961. Cybernetics, or Control and Communication in the Animal and the Machine. 2nd ed. Cambridge, MA: MIT Press.
Wolkenhauer, Olaf. 2001. “Systems biology: The reincarnation of systems theory applied in biology?,” Briefings in Bioinformatics 2(3):258-70.
[1] Watson and Crick (1953).[2] The “completion” of the project had actually been announced once before, on June 26, 2000, when U.S. President Bill Clinton and British Prime Minister Tony Blair jointly hailed the release of a preliminary, draft version of the sequence with loud media fanfare. However, while that draft sequence was undoubtedly useful, it contained multiple gaps and had an error rate of one mistaken base pair in every 10,000. The much-revised sequence released in 2003 has an error rate of only 1 in 100,000, and gaps in only those very rare segments of the genome that can’t be reliably sequenced with current technology. http://www.genome.gov/11006929.[3] DNA in its natural state takes the shape of a twisted ladder: two parallel strands winding around and around one another in the famous double helix. Each strand consists of a backbone of endlessly repeating sugar-phosphate molecules, which form one side of the ladder, plus a sequence of “base” molecules attached to each sugar. There are four types of bases, usually abbreviated as A, T, C, and G. (The full names are adenine, thymine, cytosine, and guanine, respectively.) In the complete double helix, each base links with its counterpart on the opposite strand to form one step of the ladder; thus the term “base-pairs.” The pairing always links A with T and C with G, which makes each strand the exact complement of the other. But in any case, the precise sequence of bases along either backbone is what encodes the genetic information.[4] http://www.genome.gov/11006946[5] Collins et al. (2003). Formulated over the course of two years, through more than a dozen workshops that involved hundreds of scientists and members of the public (see http://www.genome.gov/About/Planning), the agenda is organized into three themes: “genomics to biology,” which focuses on the kind of systems biology issues discussed in this chapter; “genomics to health,” which focuses on the role of genomics in health, disease, diagnosis, and treatment; and “genomics to society,” which focuses on hot-button issues such as genetic discrimination, and the genetic basis of race, ethnicity, and kinship.[6] Collins et al. (2003).
[7] The International Human Genome Sequencing Consortium (2001).
[8] In the bacterium E. coli and other such prokaryotes-one-celled organisms that lack a nucleus-this is exactly what happens. But in amoebas, yeast, plants, humans, and all other eukaryotes-organisms whose cells do have a nucleus, as well as mitochondria and many other organelles-there is an intermediate step. For reasons that are still not clearly understood, the coding region of eukaryotic genes are typically broken up by long stretches of non-coding DNA. So after each gene is transcribed, the resulting mRNA is set upon by a whole series of specialized enzymes that edit out the non-coding regions, or “introns,” and splice together the useful parts, known as “exons.” Only then does the edited mRNA move out into the cytoplasm for its encounter with the ribosomes.
[9] Collins et al. (2003).; The Mouse Genome Sequencing Consortium (2002).
[10] Witness the recent discovery that a hitherto obscure class of “small” RNAs seems to be playing a major regulatory role in a wide variety of organisms, including humans. In December 2002, Science magazine declared this to be the “breakthrough of the year.” (Couzin (2002)., and references therein.) The small RNAs are typically only about 25 bases long, but the genes that encode them comprise an estimated 1% of the entire genome, making them roughly as numerous as the protein-encoding genes. http://www.wi.mit.edu/nap/2001/nap_press_01_dbmrna.html; Travis (2002)..
[11] “Several” and “short” are relative terms. In prokaryotes such as E. coli, a typical gene has only four or five regulatory sites, the sites themselves are only about 15 base pairs long, and the transcription factors that bind to them are comparatively simple. (or at least, that’s the case in the handful of genes whose regulation has been studied in detail.) In eukaryotes, however, the binding sites are large, numerous, and widely scattered, and the transcription factors are correspondingly complex.
[12] Actually, the protein production seems to be regulated not just at the start, but at every step along the way. For example, certain of the small RNAs mentioned in a previous footnote cam regulate protein production by attacking the mRNA and destroying the data tape, so to speak, before it’s even read. Other types can shut off the translation process at the ribosome-as can certain regulatory proteins.
[13] Bray (1995).
[14] Think of a photon of light hitting a chloroplast in a green plant cell, or a hormone molecule locking onto a protein receptor molecule embedded in the cell membrane. Depending on the cell type, signaling pathways can be triggered by chemicals, light, heat, temperature, ion gradients, or even mechanical contact with another cell.
[15] As early as 1961, François Jacob and Jacque Monod-who had recently discovered the regulatory regions in DNA, for which they would share the 1965 Nobel Prize-wrote a report that emphasized the importance of regulatory feedback; talked about regulatory “circuits”; and suggested that cancer was triggered by the breakdown of regulatory control. Jacob and Monod (1962).. These are all key ideas in systems biology today. McAdams and Arkin (1998).
[16] Wiener (1961).; von Bertalanffy (1969).. This history was recently summarized in Wolkenhauer (2001).. See also Hunter (2003)..
[17] Ideker, Galitski, and Hood (2001).; Hood and Galas (2003).; Collins et al. (2003).
[18] Cohen (2001).
[19] The hierarchy of levels obviously doesn’t stop at the cell membrane. Although deciphering the various cellular regulatory networks is a huge challenge in itself, systems biology ultimately has to deal as well with how cells organize themselves into tissues, organs, and the whole organism. One group that is trying to lay the groundwork for such an effort is the Physiome Project at the University of Aukland in New Zealand. http://www.webopedia.com/TERM/W/Web_services.html
[20] Among the known mechanisms for biological robustness are negative feedback, which maintains stability; redundancy, which allows for multiple backups; and modularity, which tends to isolate failures, rather than allowing them to spread. (Kitano (2002)..) These mechanisms are also widely used by human engineers-a fact that some researchers regard as no accident, arguing that the parallels between biology and large-scale engineering may actually be quite deep. (Csete and Doyle (2002)..)
[21] Physiological processes such as metabolism, signal transduction, and the cell cycle take place on a time scale that ranges from milliseconds to days, and are reversible in the sense that an activity flickers on, gene expression is adjusted as needed, and then everything returns to some kind of equilibrium. But the commitments that the cell makes during development are effectively irreversible. Becoming a particular cell line means that the genetic regulatory networks in each successive generation of cells have to go through a cascade of decisions that end up turning genes on and off by the thousands. And unless there is some drastic intervention, as in the cloning experiments that created Dolly the Sheep, those genes are locked in place for the lifespan of the organism. (Davidson et al. (2002).). Of course, the developmental program does not proceed in an isolated, “open-loop” fashion, as a computer scientist might say. Quite the opposite. Very early in the process, for example, the growing embryo lays out its basic body plan-front versus back, top versus bottom, and so on-by establishing embryo-wide chemical gradients, so that the concentration of the appropriate compound tells each cell what to do. Similar tricks are used at every stage thereafter: each cell is always receiving copious feedback from its neighbors, with chemical signals providing a constant stream of instructions and course corrections.
[22] After all, even very small changes in the timing of events during development, and in the rates at which various tissues grow, can have a profound impact on the final outcome.
[23] http://www.cellularsignaling.org/; Gannis (2001).; Maher (2003).
[24] http://www.systemsbiology.org/
[25] http://doegenomestolife.org/; Frazier et al. (2003)..
[26] http://dc.nci.nih.gov/
[27] http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf
[28] http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf
[29] Collins et al. (2003).. To help achieve this Grand Challenge, the institute has launched the ENCODE project, a public research consortium dedicated to building an annotated encyclopedia of all known functional DNA elements. http://www.genome.gov/10005107.
[30] Smith, Hunkapiller, and Hood (1986).; Hood and Galas (2003).
[31] http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf
[32] Although there are many variations, the basic idea starts with the “chip”: a glass slide containing an array of artificial DNA molecules that correspond to the genes of interest. To assay a particular cell type, the researchers extract all the mRNA molecules, label each of them with a fluorescent dye, and wash the resulting concoction across the chip so that each type of mRNA can bind to its complementary DNA sequence. Then the researchers just have to measure how brightly each spot fluoresces to gauge how much of the corresponding mRNA was present, which in turn gives an estimate of how actively the gene was being expressed. A superb overview of microarray technology is available on a private web site created by Chinese researcher Leming Shi: http://www.gene-chips.com/. See also http://www.genome.gov/10000533 and Gwynne and Page (1999)..
[33] MacBeath and Schreiber (2000).
[34] Collins et al. (2003).
[35] The consortium already has standard vocabulary lists, or “ontologies,” in three areas: Molecular function (e.g., TKTKTK); biological process (e.g., TKTKTK); and subcellular structures (e.g., TKTKTK). http://www.geneontology.org/.
[36] http://biospice.lbl.gov/home.html
[37] http://www.cytoscape.org/
[38] TK
[39] TK
[40] TK
[41] TK
[42] DeFrancesco and Wilkinson (1999)..
[43] DeFrancesco (2002)..
[44] TK
[45] Ideker et al. (2001).
[46] Kitano (2002).
[47] See, for example, a compendium of major metabolic pathways posted by Karl Miller of TK:
[48] Kitano (2002).
[49] One harbinger of this hypothetical new discipline are the recent, successful efforts to design and implement (through genetic engineering) artificial networks in cells. TKTKTTK.
[50] The U.S. Food and Drug Administration has already used computer models to help assess drugs for factors such as cardiac safety. (Noble (2002)..)
[51] Noble (2002).
[52] Links to each of these sites have been collected at the Caltech ERATO site, which also offers a repository of the various software packages. (http://www.sbw-sbml.org/repository.html)
[53] Among representations in current use are Boolean models, Bayesian networks, generalized logical networks, Petri nets, rule-based systems, fuzzy logic, and both stochastic and deterministic ordinary differential equations. See http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf
[54] McAdams and Arkin (1998).
[55] Bray (1995).
[56] The analogy isn’t perfect, since real proteins rarely respond in a completely binary, yes-no fashion. But the analogy can be useful nonetheless.
[57] That is, the propensity of certain bacteria, such as E. coli, to swim towards higher concentrations of nutrients.
[58] Bray was quite explicitly not claiming that the cell processes information the way a modern digital computer does. The organizations are radically different, starting with the fact that there’s no clean separation between the data store and the central processing unit: the cell’s memory is the same protein reaction network that does its processing. In that sense, the cell’s information processing architecture is organized more like that of a neural network. And indeed, Bray’s 1995 paper made much of that analogy.
[59] McAdams and Shapiro (1995).
[60] Indeed, the electrical circuit analogy is almost irresistible, as can be seen from a glance at any of the known regulatory pathways: the tangle of links and nodes could easily pass for a circuit diagram of Intel’s latest Pentium chip. But the fact that the most fruitful analogies tend to come from engineering, as opposed to, say, physics, chemistry, or pure mathematics, may have deeper reasons, as well. Although molecular biology is obviously rooted in physics and chemistry, for example, the very notion of “function” takes it a long, long way from those roots. (Hartwell et al. (1999).). Organisms exist to survive and reproduce-a purpose endowed by natural selection-whereas atoms and molecules just are; they have no purpose whatsoever (except, possibly, in a religious context.) So for that reason alone, the concepts needed to understand network function are more likely to resemble the concepts already developed for “synthetic” disciplines, of which engineering and computer science are prime examples.
On a more pragmatic level, meanwhile, the engineering disciplines have already had a long history of systems-level thinking-and indeed, have already produced artifacts that are approaching biological levels of complexity. A Boeing 777 jetliner contains about 150,000 subsystem modules, including 1000 computers, a number that’s impressively close to the estimated 300,000 different proteins in a typical human cell. Just as in the cell, moreover, these subsystems are linked into an immensely complex “network of networks”: a control system that just happens to fly. And, just as in the cell, those networks exhibit an intricate interplay between complexity, feedback regulation, robustness, fragility, and cascading failures-all of which indicate that engineering and biology may have much more in common than their superficial differences might suggest. (Csete and Doyle (2002).).
[61] McAdams is an electrical engineer by training. Taubes (2002).
[62] Actually, statistical fluctuations in the current flow can make electrical circuits noisy, too-but usually at a much lower level.
[63] McAdams and Arkin (1998).
[64] Tyson, Chen, and Novak (2001).
[65] Tyson, Chen, and Novak (2001).
[66] http://www.doegenomestolife.org/pubs/ComputerScience-10.pdf
[67] Kitano (2002)./ft “p. 1663-4″
[68] http://www.webopedia.com/TERM/W/Web_services.html
[69] http://www.webopedia.com/TERM/g/grid_computing.html
[70] http://www.webopedia.com/TERM/p/peer_to_peer_architecture.html
[71] Waldrop (2002).
[72] http://www.sbw-sbml.org/index.html
[73] http://www.bioeng.auckland.ac.nz/physiome/physiome.php
2 Trackbacks
Jessie…
I Googled for something completely different, but found your page…and have to say thanks. nice read….
bredan fraser…
Man i love reading your blog, interesting posts !…