My philosophy of modelling.
Here, I link to some longer documents that I made in the
past, for classes and research groups:
- A syllabus for the class in biological
modelling that I taught. A
substantial section presents my philosophy.
- A writeup I made for the Jornada
Long-Term Ecological Research group, trying to get them oriented in the
fundamentals of modelling
- An interim report on modelling the photosynthesis
and transpiration of pecan orchards.
A section lays out my modelling philosophy. You're welcome to browse the rest, too.
- My
1987 book, Functional Ecology of Crop Plants (Croom Helm, London/ Timber
Press, Beaverton, OR; sorry, no full PDF available yet ) - where an
engineering view of how plants work or "should" work is blended
with an evolutionary view of how they were naturally selected (and why
agriculture overlooks the discrepant objectives of natural and artificial
selection at its peril…as does biomedicine)
Quick summary:
- I
eschew verbal/conceptual models (block and arrow diagrams) until they lead
to an explicit mathematical form, which forces one to put down what one
really knows (or does not know).
Some concepts don't lend themselves to explicit formulation, such
as the role of biodiversity in some ecosystem functions - a much deeper
and more careful formulation is needed before we design more bad
experiments.
- Models
can be used in many ways. One gross
classification is
- for
prediction (if one really knows the system), or
- for
developing hypotheses (if the system parts all work as we think so, the
system will show all these behaviors; let's go test them; models of this
type are potent ways to design experiments so that we attend to what's
most important, rather than incorporating 10 to the googol treatments),
or
- for
synthesis of our knowledge (e.g., let's put together what we know about
how soil and atmospheric environments control plant water use; we can do
a much better job now).
- I am
heavily in favor of mechanistic models vs. statistical models. Mechanistic models capture real
understanding of causation (with any luck and effort). They can also be applied readily to new
conditions and times. Statistical
models are inherently biased, in the practitioner's eye, toward linear
models. Most natural phenomena are
linear only over short ranges - e.g., stomatal conductance of leaves
- I go
for models that are the least data-hungry, which means finding the most
robust formulation of processes.
Example: models of leaf photosynthetic rates are abundant and
varied in form; most have limited accuracy and their numerous parameters
must be redone for every new plant and every new condition. The model of Farquhar, von Caemmerer,
and Berry
(1980 ff.) captured all the complexity of the biochemistry in 3 parameters
(for light-saturated rate), and 2 of these are essentially universal among
all vascular plants. Can't do
better than that!
- Still,
many realistic models grow to become complex. I have a rule of thumb, in addition to
being able to get data/parameters for models: I stop adding processes when
I forget what I was doing in the large.
Naturally, this happens to all modellers.
- Models
have to be testable and tested, and not just against other models or for a
few predictions among thousands (read: remote sensing models over millions
of pixels where measurements can never be made must have a great deal of
other features in their favor!
Predicted spatial patterns can't be eyeballed as "looking like
the observations" but must be analyzed with tough spatial
statistics).
- Models
can take a lot of time and effort, during construction and/or during
execution, because of complexity on any of several levels:
- Conceptual:
the concepts are elaborate in form or are numerous.
- Mathematical:
even with simple concepts, the mathematical solutions may be very
difficult to obtain - e.g., coupled nonlinear equations
- Computational:
similar to mathematical complexity, but not identical to it: a few
equations may have a beastly amount of computation involved, as in
optimizing several physiological traits simultaneously à
the need for simulated annealing or genetic algorithms)
- Data-hungriness:
plant growth models or patch-dynamic models may require only a few
descriptors of plant resource use and/or dispersal, but for many
individuals or species. Can we
really get all this info?
I like solutions in closed form
(i.e., one can write an explicit mathematical form for the answer), but I'll
accept numerical solutions, even those that are truly black-box, such as neural
networks, when necessary.
- There
are distinct elements in models that must be regarded carefully. Considering the common
differential-equation models (or related difference-equation models) such
as for plant growth, water movement, etc., we need to distinguish
- state
variables (the responses we want to track),
- parameters
(constants, of physical, physiological, developmental, or ecological
origin) that occur in the equations describing the state variable
changes),
- driving
variables (external to our system, and simply prescribed, such as wind
and precipitation),
- boundary
conditions (in space - what happens at the limits),
- initial
conditions (where we start, in time), and
- process
equations (equations describing how everything changes in space and
time). There's often a lot of
sloppy thinking on these, to the extent that no real models are ever
attempted (e.g., some prominent desertification 'models').
- Design
practices:
- I
insist on documentation, as narratives outside the code, and as copious
comments inside the code.
- I
debug all code for every step - many nasty surprises lurk in even simple
codes. Some people don't debug;
don't touch their codes or rely on their results.
- I
admire the aims of object-oriented programming, but I don't do it. I find the interfaces tedious,
especially if one really wants to avoid having to customize them for each
purpose, defeating the whole intent.
Truly big models do need OOP.
- I don't
like modelling packages that enforce certain structures. An example is Stella, which basically
forces you to do stock-and-flow models and dissuades you heavily from
doing spatially distributed models (unless you want to program in a ton
of levels and lose track of what you're doing). Excel is sort-of general purpose for
quick answers to simple problems, but its solution to nonlinear equations
(solver) is awkward to use within larger iterative schemes. Matlab, Mathematica, and Maple are good
general-purpose packages, but you want to use scripts that you save, so
that you don't forget all the changes you made. For these reasons, I prefer Fortran (an
update warhorse, now quite powerful); C, C++, etc. are good alternatives.
My models are not big in the sense of needing
supercomputers. Models of some of the
co-organizers are this big. I'd like us
to discuss 'bigness' or complexity on the several levels I outline below.