Metascience: invariants and evidence
In mathematics and physics there's this notion of invariance, or relatedly, conserved quantities. One can take a system, measure the quantity, then come back later, observe the system, and no matter what it looks like, the quantity should still be the same. If we know this we can immediately know other quantities in the system. Conservation of energy is one example: If we observe a marble atop a 1 m block, sitting still, we can say that relative to the ground it has a potential energy of \(E_p=mgh\) (Where m is the mass, g is the acceleration due to gravity, h=1 is height). Now we return and observe that the marble has slided down a ramp and is now rolling in the floor. In an ideal world where we disregard friction, the potential energy will be zero, but because energy is conserved, we should see all that energy as kinetic energy: \(E_k=\frac{mv^2}{2}\). This quantity is equal to the one we had initially. Making both equal we get that the speed must be \(v=\sqrt{2gh}\). The same is true if we drop a ball from a height of 1m, when hitting the ground the speed will be precisely the same.
Can we do the same thing when studying science?
In a recent essay, The trouble in comparing different approaches to science funding, one of the sections is Non-stationarity, referring to the idea that the underlying distribution from which, in a simplified way, we are drawing discoveries from, can change, and so learning about how science works today may not be a useful guide for understanding how science works tomorrow. An example of this in the essay is the Fosbury flop,
We remarked earlier that the nature of high jump completely changed between the 1968 and 1972 Olympics, thanks to the introduction of the Fosbury flop. That is, the underlying distribution was non-stationary: the way high jumpers trained and jumped was changing very rapidly.
It is unlikely that, had we tried to predict the dominant style of high jump in 1960, we could have come up with the Fosbury flop, until Dick Fosbury came up with what initially was maybe seen as a quirky way of jumping.
But this impossibility is not intrinsic: There are some things that did not change significantly in that time period. The laws of physics, human abilities (with the exception of improved nutrition), and the rules of the game. The best way to jump may have changed, but the underlying constraints have not, and the best way to jump is determined by those three things. And what is a stronger claim: with sufficient compute power we could have actually derived the Fosbury flop ahead of time and furthermore (informally, by simulation) proved its optimality, making an even stronger claim: that high jump has been perfected and no further improvement is possible. It could be, however, that there exists a Pareto frontier of techniques that achieve results that are better in some contexts, perhaps there is another way to jump that is very risky and rarely done well but if done well it can improve on the Flop. But the point still stands: If we know the invariants of the problem at hand we can make assertions about the best way to solve that problem now and in the future.
What are the constraints of the activity of doing science?
Some things that come to mind:
- The underlying laws of nature remain unchanged. But the objects that science study have changed. These are not guaranteed to have timeless laws that we can discover. However, we can make some approximations: for example proteins evolve, and even across human populations one finds multiple variants. A protein we are studying may be radically different if we go enough back into the past, or enough forward into the future, at which point we may as well call it something else. But in shorter timescales, relevant to planning science funding, we can take this, and broadly biology in general, to have remained unchanged. The social sciences have it harder, but in this essay I am mostly thinking of the life sciences and material science.
- The "rules of the game" have clearly changed: Science used to be done by weird or wealthy individuals working alone and now it's done by teams of individuals of modest income. In the past you could just fuck around and find out, now you need to apply for NIH grants, go through IRBs/IACUCs, publish the sort of papers that will get published in journals deemed prestigious and lead to tenure, etc.
- Human abilities can be taken to be unchanged. The human brain has not radically changed the way it works in the last 500 years. The composition of the scientific workforce has changed, and so on average various traits will be differently represented amongst the current population of scientists vs that present in XVII century Britain.
- But we now have expanded those abilities: Now there's Google Scholar, online publishing, etc.
Invariants can be put together into invariant heuristics: If what makes for a great scientist or inventor is being creative, intelligent, hard working, able to work with others, etc, and furthermore if empirically those traits are distributed as a normal distribution, it follows that the greats will be an easily recognizable minority. That doesn't mean that one can chop off the distribution at the right tail but it's a starting point: Maybe this small elite relies on lots of exploratory work from everyone else. A discussion of this is in my Newton hypothesis post, where I do find that most of highly impactful (by citations) authors cite work by other highly impactful authors. But a serious investigation of this would include a sampling of unquestionably impactful specific papers, interviewing the authors if needed. What I would like to see next is an examination of specific seminal papers in various fields, taking the most relevant references they cite, and seeing if the heuristic "Fund only the top n%" would have led to the exclusion of those. Appendix B contains the start of one such exercise.
What do we need from meta-science?
Gwern notes that The importance of a statistical criticism is the probability that it would change a hypothetical decision based on that research, and the same is true for meta-science. Unfettered exploration is valuable (No one chose to fund Francisco Mojica because his research could one day lead to advances in gene editing), we can still ask what kind of decisions we want to make using models and evidence collected by meta-scientists, and what sort of evidence we would need to make those decisions in one way or another. Here are some of those decisions. By no means this is a comprehensive list, but it aims to cover a varied number of areas and levels of abstraction.
- Should mandatory retirement be re-introduced in academia? How much should labs weigh age when making hiring decisions?
- Should NIH R01 grants
- Be reduced in size, but increased in number
- Be limited to a maximum N per lab
- Should grants be so generous and long-term that labs can pursue their research without thinking about funding?
- Should we just try to find a small number of researchers in a field deemed to be excellent and fund them particularly well?
- Should science be funded to a larger proportion by lottery?
- Should we start a new institution (or fund existing efforts) to
- Replicate existing studies
- Produce summaries of the literature
- Create datasets or tools that are broadly useful to a field(s)
- Accelerate a field in particular
- Award grants in a matter of weeks
- Should science funding be guided by general rules ("No more than 3 grants per lab") or should it be guided by plenty of discretion ("Trust me, lab X is really promising")?
Analogies from other fields
It recently ocurred to me that the situation with science is similar to what we find in the domain of geopolitics. Picture yourself as Kennedy during the Cubn Missile Crisis. You learn there are missiles in Cuba, what do you do? Or in military strategy. There is a body of knowledge out there telling generals how they're supposed to think and act, but this is not based on incontrovertible evidence, these domains rely on analogy and case studies. There are even books that try to teach how is one supposed to pattern-match a current situation to an appropriate historical case study to decide a course of action. Consider a particular book, The Kill Chain (I claim no particular expertise in that topic). The book's main thesis is that the US military, despite its numbers and advanced technology lacks the right sort of systems needed for conflicts to come, and that there should be increased investment in missiles, autonomous systems, and information processing capabilities. When I read the book, it made sense to me. Two years after the publication of the book, the russian invasion of Ukraine seems to be confirming the scenarios the book describes. Now imagine one reads the book and begins to think in terms of first principles arguments and solid evidence vs persuasive "reasonable" arguments. One could go and try to gather more evidence, or one could try to design computer-based models to run simulations or something to that effect. Or one could have just accepted the thesis and started a company to solve the problem. Maybe that approach doesn't 100% solve the problem, but if it gets 80% of the way there, that seems preferrable to spending the next 5 years thinking and not doing anything about it.
Back to meta-science, Brian Nosek wrote a series of papers back in 2012 ("Scientific utopia") describing some problems with science as currently practiced and some ways forward. The three "Scientific utopia" papers do not justify in an extremely meticulous way (with models and quantitative evidence, that is) that, e.g. scientists should embrace open access. Be that as it may, Nosek didn't just wrote those papers, he started an organization, the Center for Open Science, and secured funding for it to work on precisely those same issues. Would it have been better for Nosek to try to continue to refine his views before starting CoS? Probably not, at some point the expected value of information collection is lower than that of trying to solve the problems identified so far. On paper, Nosek could have chosen to fundraise for, say, an institute to develop better software for the life sciences, but his prior experience in social psychology meant he was intimately acquainted with the ongoing replication crisis in the field, and not so much with what computational biologists are up to. This is similar to what happenened with the new research & funding institutions that launched in 2021, if one knew the founders upfront and what their interests were, one could have guessed what the institutions would have looked like.
Back to evidence generation!
This post was initially supposed to be about finding invariants in science and whether that could be used for funding, but in starting to writing about that then I thought that doing that is not what's most helpful. The search for generality can lead to an unnecessary search for trivialities that are not actionable. Instead, we could see what could be of help to science here and now. Those recommendations and analysis and essays will be different from the ones that will be written in 2030 and the ones that were written in 1980, but that's fine. What meta-scientists should do is really talk to scientists across multiple fields, host workshops, brainstorm ideas, and publish the results of doing that as I called for in my previous post. Such an effort wouldn't provide definite answers, but I expect it would be sufficient to motivate actions in the right directions, which is what ultimately matters.
Appendix A: The mining heuristic
This is something I started to write but then decided it was not the right path to explore, but you can still have a read. The point of this section is trying to reason about the heuristic that "fields that have recently produced useful knowledge are more likely to keep doing so", which is what got me interested into biology in the first place.
One way of thinking about science is as one big pile of knowledge waiting to be discovered. Importantly, this pile is finite. There is some minor quibbling one could engage in here: Anatomy may be a complete discipline as the human body has been thoroughly explored, but as humans evolve, maybe we'll have new organs and so eventually anatomy may have new things to study. I alluded to that heuristic near the end of this post. This seems pretty obvious to me, and I suspect that if pressed, even arch-optimist David Deutsch would concede that his idea of knowledge as an endless frontier ends up cashed out as successively more precise approximations to the underlying truth. We have more digits of pi than ever before but that is of no use to anyone, and the same could be true for other fields of knowledge.
The mining heuristic means that for one field, there will be an initial growth phase, and then a gradual exhaustion when the field of knowledge is progressively completed. The extraction of ideas from the idea pit is stochastic, so one cannot just assume that a temporary blip in the trend means a permanent exhaustion. For example, fundamental physics has not made any progress in the last 50 years, but what if the next breakthrough is behind the corner?
In contrast, today we are living through the golden age of molecular biology. The probability that there will be a major breakthrough this year is substantially higher than finding something that will upend the Standard Model of particle physics (Note that here I am not comparing fundamental theoretical biology with fundamental physics, but rather unfairly comparing the life sciences in general to one area of physics).
This again seems trivially true, and it's no surprise that one can find articles discussing the stagnation of physics but not so for the stagnation of the life sciences. But the mining heuristic appears to runs into trouble when considering the case of AI: In the mining heuristic you're supposed to get a gaussian curve-shaped rate of progress, but AI has seen progressive cycles of booms and busts; progress has been far from linear. At the tail end of the first winter, should one have concluded that AI was dead or rather should one have concluded that a particular kind of AI, at least in isolation, was not promising? In hindsight, the latter. At a given time there are fields that are being actively worked on and yet to be discovered fields and paradigms. For very narrowly defined areas of science, they would behave in this way, but as long as we keep opening up new fields for exploration, we can continue to have progress as we move from one exhausted paradigm to a yet to be explored one.
In AI, a current paradigm or driving idea is the scaling hypothesis, or briefly the idea that just adding more layers to deep neural networks and training them on larger datasets will yield increasing performance. Relatedly but not as necessarily implied by it, this effect seems large enough that it can outperform the effects of say choosing the right architecture or trying to mimmick a specific region of the brain (neural networks are after all universal approximators). Given this, and given the observed progress of this approach, it makes sense to continue to work on it. Compare this effort to Doug Lenat's Cyc. Cyc is a holdover from the days where symbolic AI ruled the world and I thought it had been discontinued, but then the podcast linked there came out: After 38 years they keep going at it. I tried to search for any publicly available benchmarks of common sense reasoning or natural language understanding but could not find anything. This is a big red flag: It is common practice in the field to show what your model can do. I'm not the only one that thinks Cyc should be more open. If you were a funder, you are faced with a field that is improving fast and another that has not made any publicly verifiable progress for longer than I have been alive. It's clear where your money should go.
Or is it. It's clear to me, but my intuition is a black box that is weighting what I know (Which is not everything) in a very obscure way. Maybe if I saw a demo of Cyc I would change my mind, though I doubt it. What we'd want to compare is what we gain by funding the approach that is obviously working with funding something that might work if given another 10 years or something, how would we know that? If one wanted to use a formal model that has some interpretability built in, we'd have to assume that paradigms are drawn from some distribution (that probably changes over time) and then within each paradigm, the coefficients that determine the shape of the resulting S-curve within that are drawn from some other distribution. Being reasonable, I expect doing this would not be directly useful for science funding.
Appendix B: A quick exercise in reference checking
One can be more or less bullish on partial reprogramming, but without question Ocampo 2016 is a foundational paper in that subfield of aging research. What enabled that paper? I list here references cited by the paper in direct quotes as work leading to this one along with the authors, plus one paper that contains a key method the paper relies on. I look at both lead authors and senior authors, as it is customary in the life sciences to have the head of the lab (principal investigator) listed at the end.
- The notion that cells undergo a unidirectional differentiation process during development was proved wrong by the experimental demonstration that a terminally differentiated cell can be reprogrammed into a pluripotent embryonic-like state (Gurdon, 1962, Takahashi and Yamanaka, 2006).
- Gurdon and Yamanaka are both Nobel Prize winners
- Cellular reprogramming to pluripotency by forced expression of the Yamanaka factors (Oct4, Sox2, Klf4, and c-Myc [OSKM]) occurs through the global remodeling of epigenetic marks (Buganim et al., 2012, Buganim et al., 2013, Hansson et al., 2012, Polo et al., 2012).
- Buganim is the first author of a paper froming from the Jaenisch lab. Rudolf Jaenisch is a well known scientist who has been awarded multiple prizes
- Hansson is the first author of a paper from the Krijgsveld lab; I did not know of this lab before but Jeroen Krijgsveld has 18280 citations and an h-index of 62.
- Polo is the first author of a paper coming from the Hochedlinger (34355 citations, h-index 80, student of Jaenisch) and Ramaswamy (36385 citations, h-index 70) labs.
- Importantly, many of the epigenetic marks that are remodeled during reprogramming (e.g., DNA methylation, post-translational modification of histones, and chromatin remodeling) are dysregulated during aging (Benayoun et al., 2015, Liu et al., 2013b, Pollina and Brunet, 2011).
- Benayoun's paper was done at the Brunet lab; in turn Anne Brunet has been awarded multiple prizes and awards (47909 citations, h-index 78)
- Liu's paper has Brunet and Tom Rando as senior authors. Rando is another well known aging researcher (33091 citations, h-index 87)
- Several groups, including ours, have observed an amelioration of age-associated cellular phenotypes during in vitro cellular reprogramming (Lapasset et al., 2011, Liu et al., 2011, Mahmoudi and Brunet, 2012, Rando and Chang, 2012).
- Lapasset and coauthors did their work at Jean-Marc Lemaitre's lab. Lemaitre is one exception to this pattern because the citations metrics are substantially lower (2791 citations, h-index 27) and as far as I can see he has not received prestigious awards. Their most cited paper is preciselyy the Lapasset paper, which is one generally cited as foundational for the field
- Reprogramming of cells from centenarians or patients with Hutchinson-Gilford progeria syndrome, (HGPS) a disorder characterized by premature aging, resets telomere size, gene expression profiles, and levels of oxidative stress, resulting in the generation of rejuvenated cells (Lapasset et al., 2011, Liu et al., 2011, Zhang et al., 2011).
- Zhang and coauthors worked at the Colma (13839 citations, h-index 56) and Stewart (44931 citations, h-index 105) labs
- Breakthrough studies led by the Serrano and Yamada groups have shown that cellular reprogramming to pluripotency, although associated with tumor development (e.g., teratoma formation), can be achieved in vivo in mice by the forced expression of the Yamanaka factors (Abad et al., 2013, Ohnishi et al., 2014).
- Abad (Who just joined Altos Labs) worked at the Serrano lab (another pioneer of the field; 64476 citations; h-index 96)
- Ohnisi comes from the Yamada lab (4606 citations, h-index 27)
- To enable inducible expression of the Yamanaka factors upon doxycycline treatment, LAKI mice were crossed to mice carrying an OSKM polycystronic cassette (4F) and a rtTA trans-activator (Carey et al., 2010), thereby generating LAKI 4F mice.
- Carey's work was done at the Jaenisch lab
There is one paper not cited here, the first one to actually propose the idea of partial reprogramming which didn't get many citations; rather other authors that later proposed the same (like Rando) tend to get cited as early pioneers more. This is just to highlight that good ideas can be discovered multiple times, and the winners we remember are not necessarily the first ones, or the only ones.
Out of the papers that are cited as direct inspiration, all of them are coming from what you might call top labs, with the exception of the Lapasset and Ohnishi papers. The Ohnishi paper is cited jointly with Abad's, so in the counterfactual where we only had the top labs, it's likely that the claim these two papers support would have still be in the mind of the authors of Ocampo et al. We can't say that for Lapasset! But the distribution of top labs vs the rest matches with what I concluded in my Newton hypothesis post. It does seem like keeping the very top labs would have preserved all the relevant work leading up to this one.
In practice, however, how is this knowledge to be used? More concretely, if you have a $500M grants program, do you go seek the top N in each field of interest and just hand them a proportional amount of the money in grant money? This seems like a good idea, but maybe it's not the best idea. In practice what one'd do is find someone to then find a number of domain experts and ask them for ideas, and the outcome of that would determine the result of that particular round of grants. As a domain expert, one probably knows what the distribution of good papers looks like, whether they are spread out or concentrated in a handful of labs; also as a domain expert one probably knows underrated talent that is unlikely to be surfaced via publicly available information. It's the tacit (and private) knowledge problem all over again.
Citation
In academic work, please cite this essay as:
Ricón, José Luis, “Metascience: invariants and evidence”, Nintil (2022-03-17), available at https://nintil.com/invariants-metascience/.