In 2020-21 I wrote a series of blogposts on science funding, examining the meta-science/science of science literature, which deals with questions like how well peer review works, the effects of age on the productivity of scientists, or whether a minority produces most of scientific progress. Unsurprisingly, these questions are hard to even start to answer, because translating "good science" into numbers that one can then plug into various models invariably requires leaving out some of what we might call "good science". There are areas of meta-science that I did not cover in depth in my posts, like precisely how good those metrics are, there's a discussion of those in the recently published The Science of Science (2021) by Dashun Wang and Albert-László Barabási.

Given this body of knowledge, there's then another question: How can we use it to inform reform in science both of existing institutions and to propose new ones. A first step to do that is understand how meta-science is used in practice to argue for various kinds of reforms. Here are some thoughts on that:

But what is meta-science

According to the book,

These practitioners of science of science use the scientific methods to study themselves, examine projects that work as well as those that fail, quantify the patterns that characterize discovery and invention, and offer lessons to improve science as a whole.

One example of a meta-science paper is the one I discuss here, comparing the Howard Hughes Medical Institute to the NIH. One could take that paper, use it to argue for the need for long term grants, and go start a longer-term grants program.

But one could also, before 2011 when the paper was published, observe what HHMI investigators were up to, like it, and it doesn't take lots of analysis to see that a key difference from other funding programs is funding for longer (and with more money). It is not hard to see why this would be probably good. This second point is more of a casual observation or a collection of anecdotes, especially of obtained by talking to scientists, and so whether it counts as meta-science is unclear. Similarly, the AAAS compiles various charts and tables on R&D spending patterns in the US which could be useful as input to various decisions.

How Life Sciences actually works is a collection of useful information on various aspects of academia, obtained by talking to scientists. You could think of this as a Latourian-style anthropology of science (Whether this is meta-science or not is up to you!). The essay contains points that are completely absent from the meta-science literature. The fact that a grant to study topic X is used in part to fund preliminary data in topic Y to get another grant to study Y is common practice in the life sciences, but not something you see studied by the meta-scientists. One of the points in the essay, that big labs may be good or bad and that we just don't know has been studied with quantitative methods, as I described here. Some of the work I cite prima facie seems to support that smaller labs are better, but having spent time poking at the papers, this is not clear at all. Moreover, these papers did not seem clear enough to push forward a NIH reform that would have capped how big a lab can be (by limiting the grants a lab can hold concurrently).

It has been known since forever that as we age, general decline ensues; and we also know that there is substantial variance between individuals in their cognitive faculties. It is useful to know broadly the shape of this decline, and how it may affect scientific productivity. I wrote an essay on that here that ended up with a lot of "on the one hand-on the other hand-ing"; there is data there you could use to support helping younger scientists, to impose or remove mandatory retirement. The fact that scientists is getting older on average is also well known and beyond dispute. This fact alone finds its way into policy discussions without need to dress it in econometrics.

Thus we may speak of "quantitative meta-science" as trying to make claims based on statistical analysis, and "qualitative meta-science" as trying to do the same based on a judiciously put together collection of anecdotes and case studies, and "reform meta-science" as work that makes concrete proposals for how to improve science, drawing from the two former. An example of this latter is the Center for Open Science's work in reproducibility or preregistration, which is meta-science but not with a focus on how to best fund science; rather on how to do science.

Meta-science informing new science institutions

Last year saw a number of new science institutions being announced: New Science, Arcadia, the Arc Institute, Focused Research Organizations (one of which I designed), PARPA, Impetus Grants. Of course before that we had Fast Grants.

I know the people involved in these organizations and I asked some of them about the genesis of the organizations: what was the reasoning behind them, did they rely on any meta-science work? These are very different organizations: Impetus Grants is a grants program to specifically fund longevity research. FROs have concrete deliverables and are not meant to be permanent institutions. Had this community of institutional engineers (including myself!) been completely unaware of the meta-science literature, would these institutions exist in the form they do?

The answer, after a few conversations, is largely yes, meta-science was not particularly helpful here. This does not mean that we acted blindly to that existing body of knowledge, or broadly to the current state of scientific practice. It's just that the kind of evidence that ended up being useful was of a different kind to that one that tends to populate the literature.

What was useful was talking to people: New Science could be traced back to this essay, Focused Research Organizations to this whitepaper (And in turn, to prior experiences trying to secure funding for a FRO-like project years ago), Arcadia to a founder's main research interest (novel model organisms) and taking that to scale, PARPA to these two essays, the Arc Institute to precursors (HHMI, Broad, Salk, CZ Bioub, the Crick Institute) Impetus Grants to Fast Grants, and Fast Grants to the simple observation that grants take forever to be paid, but that they don't have to.

To the extent to which meta-science has been used here, it has been to marshall evidence in support of preexisting conclusions. Patrick Collison (Who founded and funded Arc) cites the Azoulay HHMI paper from time to time, but I suspect had the paper not existed, Arc would have still come to be in roughly its current form.

Ok but these are all new institutions. It may be more difficult to apply meta-science to the design of new institutions than to the reform of existing ones with an established track record. Has meta-science driven reform in relatively well studied institutions, like NIH?

Meta-science in NIH reform

Take these Five suggestions for substantial NIH reforms. Other than at first to frame the context of the paper, it has no citations, but for anyone that is either in the system or has glossed over the basics of the debates referenced therein, there is little controversial in what the paper says. This paper links to 10 previous papers from various authors that have titles like Rescuing US biomedical research from its systemic flaws or A vision and pathway for NIH. Do those proposals and analysis rely on (quantitative) meta-science? Not really. They sometimes don't even cite any reference for the claims they make (Does anyone doubt that scientists actually spend 30% of their time applying for grants) which may be a bit of a stretch of good practices but in the context of these debates no one disputes the overall fact that "scientists spend a lot of time applying for grants, acceptance rates are low and getting lower, and that's all bad". By and large, the references given are from other scientists (Again from within the NIH world) that report what they see, sometimes making references to relatively simple figures, as here where the authors point to the fact that the % of Principal Investigators that is 36 or younger has fallen from 18% (1984) to 3% (2010), but without trying to disentangle whether that is the product of the natural aging of the population or bias against young PIs or something else (For that see here).

This small body of work also shows how science reform actually happens: scientists point out problems, using a combination of both their direct awareness of them and some basic statistics, and then convene workshops to exchange potential solutions .

Some of them do cite some (quantitative) meta-science literature, in particular the work of Azoulay on the HHMI and Paula Stephan's, but these are cited to add flavor to the text rather than to directly make a major point.

Two of these do leverage meta-scientific work to make their points: Bourne (2013) for example questions a series of "axioms", one of which is that bigger labs are better, citing the same body of work I reviewed here. Unlike Berg (2012), he however does not advocate for capping labs at a given size, rather he uses those papers to make the point that big doesn't mean better and that scientists should critically evaluate their own institutions. Maybe some big labs work great and others don't, and so we can't escape from a case-by-case study by those that are closer to the labs in question:

Start with those parts of your department or institution that you know best and explore which efforts deserve, on grounds of quality, to be expanded, which should be maintained at their current levels, and which should be reduced (or scrapped altogether). Test your conclusions by asking more questions, gathering further information and making comparisons with other parts of the department or institution (see Box 1 for more details). And when you have convinced yourself that you know how to improve the overall quality of research, find allies and make it happen.

Taken together, these and other reform proposals make heavy use of personal experience and anecdotes coupled with relatively simple numbers that apply to the system as a whole, woven together into narratives that make sense as arguments for the proposed reforms, but that are neither knockdown arguments for them nor are they arguments that would not be made in the absense of quantitative meta-science.

But why?

Why doesn't meta-science readily find its way to the reform of science? Because it's hard. In the presence of hard problems we can either black-box the domain and run RCTs or have a model or simulation of the domain that tells us what to do. For the former, it's illustrative to take the kinds of studies that can be readily used to drive policy or action more directly. Those studies tend to be

  1. Randomized Control Trials (That are designed carefully upfront)
  2. Large-ish samples
  3. Some grounds to believe that the study has external validity
  4. The endpoint being measured is clear (e.g. did the intervention raise income n years after)
  5. A relatively closed system where there is no interaction between treated and control (e.g. isolated villages)

In the context of science reform or meta-science, we rarely have RCTs, when we have them they are small one-offs that may be applicable only to that specific setting, and worst of all the endpoint being measured can be very unclear: Even if we show beyond all doubt that a science reforms leads to more citations, is that all we need? And what's worse, scientists as knowledge workers exist in a world that's far from closed. If they don't get a given grant, maybe they will get instead another grant. Maybe someone else's reform will impact the effects of your study. RCTs also don't measure systemic changes, which are some of these proposed reforms.

Take for example Givewell's recommendation review of bednets for malaria prevention. Compare this fine-grained level of detail and clearly important outcomes being measured (mortality, income) with what one usually finds in the meta-science literature. This is not a failure of the meta-scientists, it truly is a wicked problem!

If we can't really extra much meaningful out of RCTs, we have to fall back to actually having to think about the domain. With a good RCT we can get some numbers, do cost-benefit analysis or things of that sort. With a good model we would also be able to do this; take something like AlphaFold: protein sequences in, really good predictions that match reality out. If we had an AlphaScience there would be no more science anymore, as it would just figure out everything there is to be figured out. Instead we have imperfect models:

  • How specific institutions work
    • What their budgets are
    • How they have changed over time
    • What they tend to fund
    • What people (grantees, grantmakers, policymakers,...) like and dislike about them
    • The specific processes they use to allocate grants or manage research
  • How human cognition works and how people vary in these skills (a harder one!)
    • IQ
    • Memory
    • Creativity
    • Energy/stamina
    • Concentration/deep work
  • How individual knowledge workers work together (an even harder one!)
    • "Scenius"
    • Agglomeration effects
  • The historical record of each field
    • What sort of discoveries get produced
    • How often they make it into innovations
    • Whether the field is regarded as fast moving or stagnant (either by the field itself or outsiders)

None of this tells you exactly how to reform the NIH or how to start your scientific institution of choice, for that you have to think carefully!

Taking meta-science into production

If you've read some of my latest posts (Talent, Tacit Knowledge, Notes on 2021), you will have noticed my move from weighting more data-driven decisonmaking (I was very optimistic about RCTs here) to trusting more qualitative data and case studies. In domains like science funding or entrepreneurship, there is not enough data to draw conclusions based on large samples of past experiences that are taken to be informative of the future. Instead, by trying to build a mental model of the domain, informed by unrelated areas (cognitive psychology say), then one can try to put together enough evidence to drive decisions.

This is my first recommendation: We need more qualitative evidence (like this). More analysis of what is going on in academia, and this should be rich in details. This may mean a compilation of—perhaps structured—interviews with scientists asking them about their daily life, and how they would change either their most immediate environment (lab, university) or science funding in general. It is time for us to think of someone other than Latour when the idea of going to labs to observe scientists is mentioned.

Meta-science also covers a broader set of areas: The Science of Science mentions multiple ways to compute indices similar to the h-index that could be used to compare papers. But have you ever seen platforms making these easily available to enable policymakers or scientists to play with them to see if they match their intuitions of "good science"?

Similarly, there are newer ML-based methods that could be useful to understand citation patterns in more sophisticated ways than just counting immediate citations. By considering more nodes in the graph of citations, Weis & Jacobson (2021) developed a model that identifies work that will come to be regarded as impactful. This could be used for all sorts of things, but one use that comes to mind is detecting early talent and accelerating their careers. This may be handwavey but suppose that you just want to go ahead and run a grants programs based on their DELPHI algorithm. Ok so you go to their github page and... there we learn we need to obtain this dataset from, but it doesn't say how or which. I assume it's this one ("Scholarly Bulk Data") that one has to pay for. Once again, we can't easily play around with it (Given a few hours we can probably get it to work I guess, and then do some development to make it pokeable).

That would be my second recommendation: To take novel models and metrics and make them easier to work with, directly from a website. This would allow users to get a researcher(s) Relative Citation Ratio (as in here!), but also other adjusted h-indices as well as ask questions like "What is relevant in X literature right now", or "What will be relevant in the future", or even "Who is underrated who could benefit from more funding". There is always the question whether this would be really helpful, because if someone is obviously good and this can be known purely by their publication record, they will be hard to miss to those reading those papers. Or not! If we had a list of underrated candidates, then one could run those by various experts to see what they think. Instead of guessing, we could just have this and give it a try.

Many cancer studies fail to replicate, not necessarily because the original results were sketchy, but also because there isn't enough public information and/or unwillingness from the original authors to engage and aid other groups redo the experiments. A number of casual observations like this one would be enough grounds for some future institution (Like.... my proposed Adversarial Research Institute which would be take some of the roles of the Center for Open Science, the organizers of the cancer reproducibility study).

One may say: But isn't the Center for Open Science literally taking meta-science into production? Yes! And we can see why. The CoS has been focusing on robustness, and robustness is easier than exploration.

When trying to replicate a paper, there is a clear-ish way to examine whether that was successful: were the methods similar enough to the original paper, and did the results agree with the original results. When calling for pre-registration or fighting QRPs, it doesn't take a lot of RCTing to understand why falsifying data or squeezing data until it says something would be bad and worthy of getting rid of.

So a third recommendation: More adversarial research (or whatever one wants to call it) to establish what's really there. It can't be that when one talks to top PIs one gets "But I can't trust the literature, I need to replicate everything in my own lab". Having independent parties producing replications could save everyone time by being able to take more for granted.

Appendix: But what is really meta-science

Good question lol. The field asks itself the same thing. I roughly buy the idea that "It's studying how science works"; and from there you could split it into theoretical and applied, where applied is proposing concrete reforms vs noticing patterns or modeling phenomena. This lack of clarity (I admittedly haven't put much thought into it) runs through the essay where "institutional engineering" runs side by side with "open science" and "anthropology/sociology of science". Is this all meta-science? I don't know; for me what is relevant is that all of these have something to say about the same interrelated domains.