Sometimes one reads discussions of causality in academic papers. Expressions like "this gene causes that" or "Alzheimer's Disease (AD) is caused by XYZ". Or of course, "We don't know what causes X". Recently I found myself thinking about these three statements:

  • We don't know what causes Alzheimer's yet (eg here at NIA)
  • Old age does not cause Alzheimer's, but it is the most important risk factor for the disease (also here at NIA)
  • Aging causes Alzheimer's (eg here at Fightaging)

What does causality mean there? How can something be a risk factor but not a cause? Intriguing!

People (philosophers excluded perhaps) ususually don't think much when they throw around the word cause, but in biology, from the way the word is used, "X causes Y" tends to mean "X is necessary and sufficient for Y". Hence in Alzheimer's case as I discussed here one might make the claim that amyloid beta plaques don't cause AD because one can find older adults with plaques but no dementia. One can then counter that there are other factors at play. In the NIA definition above, "old age does not cause Alzheimer's" means that being old doesn't guarantee Alzheimer's.

But with early onset Alzheimer's, where having a small number of mutations (usually in the PSEN1 gene) people do speak of "those mutations causing early onset AD". If you have them, you will get the disease with ~100% likelihood, if you don't you won't. This is a clear example of the "sufficient and necessary" criteria that seems pervasive.

With cancer, we can make the broad claim that "cancer is a genetic disease" even though there is no one gene that uniquely causes cancer (no necessary set) but one can find combinations of mutations (say a p53 mutation coupled with a KRAS mutation) that do lead to the disease, (many sufficient sets).

But this is obviously too simplistic and breaks down especially wen we try to understand complex diseases or processes like aging.

If you take a car and ask what causes the car to move, there is no one element that causes it. If you take the tyres or the engine away or the car has no fuel the car won't move. All of these are necessary for movement but there are no sufficient "causes". The folks over at NIA, if assessing a car would then say that "We don't know what causes cars to move, but fuel, tyres, and the engine are known factors that are involved".

How did this come to be? Probably because biological experiments have historically proceeded one blunt thing at a time. It's easier to KO a gene to see what happens than to downregulate by 50% 10 different genes, but the effects of those 2 sets of changes may be the same.

But in the case of Alzheimer's, consider this alternative explanation. We know what causes Alzheimer's: Proximately, the symptoms we observe are due to neurons dying. Most of this death is due to the action of hyperphosphorilated tau aggregating into neurofibrillary tangles (NFTs). Amyloid is not required for NFTs to form, there are other tauopathies, but AD is an amyloid-driven tauopathy. We know microglia (and neurons too to some extent, via autophagy) can clear up these two aggregates, and we know that babies don't have these. So what's happening is that either the rate of production of the aggregate has increased (dysfunctional neurons) or that of degradation has decreased (say dysfunctional microglia) or both of course. This model has to be true a priori if we want to fit the basic facts we know about the disease. But the model doesn't say that one particular thing or gene is a cause of the disease. For example, one might have an ApoE4 allele that causes reduced clearance of amyloid beta, so the brain hits the threshold to disease progression much earlier. Or one might produce a lot more of Abeta (the case with the PSEN1 mutations) and reach AD much much earlier. If you think of these as equations, the form of the system is the same but the coefficients are different in each disease and in each person. We can perhaps think of late and early onset as the same disease, but one caused by faster generation and another by reduced clearance of aggregates.

But then there can be many reasons why the coefficients in the equations change: infections, accidents, different genetics, or other comorbidities. In this sense it's very much like cancer: no sufficient-and-necessary mutation but rather a series of little causes that nudge the brain towards the diseased phenotype.

In one person, for example, maybe they have really good autopaghy and so they stay disease-free for longer. In another, they are overweight, they have more inflammation and that leads to earlier progression. We would be wrong to say that "reduced autophagy causes AD" or that "inflammation causes AD" or that even "PSEN1 causes early onset AD" as even this requires some amount of aging to proceed.

To make things worse in biology all these causes are connected: it's likely that say increasing inflammation impairs autophagy or that at some point more amyloid leads to more inflammation and damage and that damage leads to more amyloid. At its core, DNA makes RNA makes proteins but then these proteins affect how DNA is transcribed.

If we go back to a mechanical analogy, imagine there is a little tear on the wing that causes some vibrations; over time this vibrations affect the engine which breaks down and the airplane falls. Likewise, imagine the turbine's shaft is not well compensated and generates vibrations, the wing starts vibing and it falls apart. Not that this would happen often in commercial aviation because we've gotten quite good at aircraft safety, but this highlights reaching the same phenotype (the crashed airplane) through two mutually-affecting routes.

What are the implications of all of this? That if we want to make progress in biology we have to move beyond the "gene for X trait/disease" idea and rather think of systems where we can still meaningfully speak of causation but in a more realistic way. ML will help there because these systems are devilishly complex and while sometimes we can make simplifications that are good enough to derive therapies (Like LDL cholesterol->cardiovascular disease) elsewhere (Alzheimer's) we cannot (We can clear Abeta but that doesn't cure or radically slow down AD).

As for aging in particular, if you think of the state of a cell as a dot in a high-dimensional space, then young cells starting near each other and then they disperse into various trajectories of dysfunction in a sort of Anna Karenina principle of disease: all healthy people are the same whereas all sick people are different. It follow then that almost every small random nudge to a cell won't be enough (there are so many dimensions) to make the cell young in a consistent way. By young I mean more functional as it was when young, so this applies to disease in general. We are not going to fix the aging brain upregulating or knocking out genes, at least not one gene at a time. For example if we see a reduction of lysosomal function in microglia and then we see that 10 enzymes that are key there are downregulated, are we going to throw in all those 10 mRNAs? And there are probably more things that are wrong than those 10 anyway. Note that this doesn't mean single drugs can't help in every case. Statins work and target a specific pathway.

This sort of thinking outlined here, it seems to me, should be a central part of what it means to "work on aging research": to seek to deeply understand complex interactions across genes, cells, and tissues, and across contexts.


  • 2023-11-29: I removed a reference to the central dogma of molecular biology at the suggestion of Per Kraulis, see here. Though I think it doesn't affect the thrust of the argument (I mostly used it as a textual embellishment, it was nonetheless wrong)