Limits and Possibilities of Metascience

Jose Luis Ricon

Limits and Possibilities of Metascience

2022-12-05; Last updated: 2022-12-05
Wordcount: 9299 | Reading time: 50 min
• Research • Science Funding • Progress Studies •
Is this article wrong?

Summary

The prospect of improving the way scientific institutions work by rational analysis and experimentation (or metascience) is alluring, but there are limits to what metascience can achieve as a discipline. The design space for new institutions and reforms is large but not vast. The key bottleneck to metascientific progress is the difficulty of experimentation and comparing alternatives. Some areas of science are amenable to improvement via experiment and analysis, particularly those involving replicating work. Research and execution of metascience entrepreneurship should be tightly coupled.

Introduction

Michael Nielsen and Kanjun Qiu (N-Q) recently published a lengthy essay (30k words) on meta science. Over the past year or two I have been publishing here on meta-scientific topics as well, totaling around 80k words (at this point these are more books than essays!), going into the minutiae of various foundational papers and talking points. The points I make throughout my work are sometimes rarely found elsewhere in writing though they are often discussed in person in the metascience community (It's generally agreed that the Azoulay 2011 paper wouldn't change someones mind as I noted eg. here, but you wouldn't know this was an established point if not for me writing that!).

After all that time reading and thinking about the topic and attending and presenting at meta-science conferences, what conclusion do I come to? What comes next?

N-Q's vision is one where we develop metascience into an engine driving improvement in the way humanity understands the world. It's an optimistic vision, proposing exploring an enormous design space for social processes, leading to a far more structurally diverse set of environments for doing science; and ultimately enabling crucial types of work difficult or impossible within existing environments.

There is, however, another optic from which all we have done is interesting but ultimately unhelpful theorizing that will not lead to any meaningful change.

This perspective is embodied by a response N-Q got as feedback and that they gesture at in their essay: "all that matters is to fund good people doing good work"; implying that trying to think deeply about the way science works will lead to minor improvements, making all that effort a waste of time. Just find good scientists and give them money and time, everything else is a footnote to this central dogma of science funding.

I too have heard this view and, I will admit, I have lots of sympathy for it. When thinking whether to write this very essay, I found myself thinking that it's quite possible that a single post I've written on US immigration will be more helpful than all the time I've spent on metascience, with the exception of my work on bottleneck analysis in general and of aging in particular, as that led to a concrete project, Rejuvenome. The time it took to write these other pieces was less than the time spent on reading uncountable books, papers, conference recordings on metascience, trying to squeeze a dry lemon for not, it ends up being, much juice. So that got me thinking: What are the returns to spending time on meta-scientific work. If someone spends 1000 hours in meta-science, is that a good use of their time, compared to some alternative? Broadly, what should someone with the high level mission of accelerating science do?

In this particular essay I want to explore this negative case: that the space of possible institutions is not as vast as it might seem, that the benefits we can gain are often, but not always, marginal, and that these being hard to measure makes it difficult to even know how to improve the system or have a general set of prescriptions. However, metascience can learn from ''metaentrepreneurship", the heuristic, case-study based body of knowledge routinely leveraged in daily practice in another domain that has so far proven resistant to useful scientific theorizing, entrepreneurship.

As this essay is written as commentary on the N-Q essay, you should read that first! It will also be useful to read my previous writing on the topic.

Some case studies of metascientific entrepreneurship

Here I discuss some of the examples mentioned in the N-Q essay, using them to make various points about the difficulty of deriving scientifically grounded prescriptions for improving science.

Example 1: The replication crisis as a metascience success story

The replication crisis is the main example of success that is discussed in the N-Q essay. Are there more metascientific victories like this one awaiting? This example is an existence proof of progress: If one thinks nothing can be done, one can always point to this as credible counterexample.

This particular episode in the history of psychology started with researchers noticing, over many decades, that the statistical methods and publication practices being followed in the field were leading to results that were probably not true. Eventually, a reform movement spearheaded by Brian Nosek and the Open Science Collaboration, built on top of explicit attempts at replication, induced field-wide change in research practices.

It's a good example of meta-scientific work because the work involved not necessarily new, more powerful, statistical methods with interesting properties (that would be scientific work), but rather:

deep new ideas, such as Registered Reports, as well as the toolbuilding and infrastructure necessary to make them work. It required partnerships with journals and other organizations, so that Registered Reports had a chance to be widely adopted. It required branding and marketing and narrative-crafting, to get widespread adoption and to begin a change in the internalized values of scientists. It required the building of yet more tools and infrastructure to store code and data and materials, to enable easier replication. And, as we'll see, it required institution-building.

Importantly, post-facto everyone sees this as a success. No one says "Well, absent the replication crisis, who knows if things would have changed anyway!" That papers fail to replicate was something shown repeatedly and unquestionably.

Could this happen elsewhere? This particular episode was possible because it was possible to drive broad agreement that the methods that were being used were weak and didn't replicate, and it was possible to show this by actually trying to replicate multiple studies, in parallel with lots of pedagogical work and theoretical arguments grounded in statistics.

Suppose, for instance, that no one had tried to run a large scale replication effort. There wouldn't have been a replication crisis. Not much would have changed. What mattered here was an experiment (a large scale replication attempt) that showed a field that their methods were not good. Indeed, as N-Q point out, one can find the same theoretical arguments being made many decades prior (by Meehl and others) but not much changed in the field until the actual experiments were done.

What would this look like in, for example, some areas of physics where experiments are currently unfeasible and some are calling for post-empirical methods of theory confirmation? What does the equivalent of the replication crisis look like for string theory? Is that even possible or just a fancy speculation? Lee Smolin can keep writing about string theory being a cult, that won't have, I predict, an effect on the field as strong as the replication crisis had in psychology. This is not to say that string theory is right or wrong or a cultish or openminded field, just that seeking change, sociological or otherwise, in domains untethered to clear experiments is hard.

Currently, the situation there looks like the pre-replication crisis era where for many decades scientists had debated the merits of their methods and approaches (Is beauty a good guide to truth, in the case of physics) but I doubt these arguments will sway the thoroughly convinced: we need experiments with clear answers.

Other areas of science are, in contrast, ripe for this sort of disruption, areas the janitoring of which will be more useful to the average person, like the life sciences. Here it is harder to replicate studies because the knowledge required to even do the experiment is very specialized: failing a replication may just mean the replicator was not skilled enough. So while there was a recent project to replicate cancer papers, some of this work was met with skepticism. Quoting from an earlier Nintil post:

The Center for Open Science tries to reproduce a bunch of cancer papers, they are not able to reproduce most (59%) of them. This doesn't mean all the effects were not real! It means they were not able to either obtain data/materials to replicate them or that they attempted it and failed. It's unclear of course if this failure to replicate was improper protocols on CoS's side (tacit knowledge!) or if the original research was actually flawed. That all said though, it looks really bad for scientific practice. The STAT News articles adds more color to this story:

“Human biology is very hard, and we’re humans doing it. We’re not perfect, and it’s really tricky,” he said. “None of these replications invalidate or validate the original science. Maybe the original study is wrong — a false positive or false signal. The reverse may be true, too, and the replication is wrong. More than likely, they’re both true, and there’s something mundane about how we did the experiment that’s causing the difference.”

This is a good opportunity! Surely we can improve on trying to make biological experiments more reproducible and results more solid. There is a metric for success (Does doing the same thing result in the same outcome?) which is clear and there are experiments one can do in either how scientists are trained, or protocols recorded, and see how that impacts reproducibility.

Hence this is probably the most promising and probably highest reward area of experimental metascience: Studying why studies fail to replicate and improving fields until reproducibility is the norm. From this goal we can derive prescriptions for individual scientists (How to do science), funders (funding replications), and entrepreneurs (Is there any product that needs to be built to help this goal, like lab automation?).

But science has areas that are not like this: if we can't do clean experiments where we measure how different approaches work, and N-Q argued in an earlier essay that this is the case for funding, then how decisive results (N-Q's term) are to be attained to drive change? . Our optimism about areas of science with simpler metrics of success (replication) should not transfer to reform of scientific institutions in general. N-Q of course admit to the difficulty of advancing metascience for this reason:

We won't solve the general problem of how to obtain such [metascientific] results: that will take thousands of people many decades. One small indicator of the difficulty of the problem is that the decisive results of the replication crisis weren't obtain by an RCT; rather, they were obtained in a bespoke fashion, in response to a particular problem. Still, despite the challenges, we are optimistic that further work will develop approaches strong enough to routinely provide decisive metascientific results, completing the metascience learning loop.

I'd like to hear more about that optimism because I lean the other way!

Example 2: Empowering young scientists

Scientists are getting older, and cognitive abilities tend to decline later in life. Does the aging of the scientific workforce constitute a problem? In the aggregate, probably, as I concluded here. This result is not one that is backed only by looking at scientific output; it's one grounded in human biology as well. The fact that cognitive abilities decline with age, as a regularity, is one of the invariants we can use when thinking about science: human cognitive ability declines regardless of whether one is a mathematician or a biologist, regardless of whether one works in the XXI century or the XVIII century, regardless of whether one is a wealthy independent scientist or a R01-holding PI. But this has not been enough to drive change. The truth remains that there are older scientists that remain very productive. Moreover, when considering teams of scientists or labs where the old scientist is more of a manager, it becomes even less clear: It may well be that having one mentor-type figure and a number of students works better than if everyone in the room was under 25.

Institutions in theory allow young individuals to have their own labs (thanks to e.g. the NIH Director's Early Independence Award) but there is not much of an impetus or desire to have all of science to be like this. It's altogether unclear if bringing back mandatory retirement for older academics would be a good thing.

Despite this, if someone proposed as an institutional experiment my Young Researchers Research Institute,

But what if we radically empower the young?

The Young Researchers Research Institute (YRRI) would take in students that have just finished their PhDs and would put them in an environment with other younger scientists to pursue any kind of research they want. Given that it would be a weird career move, it might dissuade some researchers, but if you truly believe in your idea, and your PI doesn't want you to do that, then the will to try might be stronger than the uncertainties involved in joining the YRRI.

Freshly minted ~~NFTs~~ PhDs, it could be objected, are not yet ready to lead research of their own. This is nonsense. Sure, maybe some of them are, but there are many historical figures that achieved professorships or published groundbreaking results very early in life (e.g. Pasteur becoming professor for the first time at the age of 26, or Einstein publishing his Annus Mirabilis papers at the same age). We don't need everyone to be ready for independent research by that age for YRRI to work, we just need some scientists that are, and would like to embark on a project of their own design.

This wouldn't mean that they would be on their own: There would be an advisory board of established scientists to offer coaching and scientific advice to YRRI members; and YRRI wouldn't be focused on one particular topic, you could have under the same roof someone researching doubly-special relativity, RNA computation, gene circuits or new simulation algorithms for mechanical systems.

I would be very excited! Perhaps if I fully fleshed out a proposal for something like this it would become clear that this would not work but perhaps it would survive that planning stage. Regardless, having the YRRI as an actually existing institution would be a cleaner test of the hypothesis that a younger scientific workforce across the board would lead to more (interesting? Useful? Novel? Different? Riskier?) science than what we currently have. As a cleaner experiment, it would be easier, though still hard, to draw conclusions from it.

Why should this work? Younger scientists on average know less than older scientists, especially in tacit-knowledge-heavy areas like the life sciences. But they can also work harder and longer hours. Whether they will work on 'more interesting/riskier' research is not a priori given. One could imagine an alternative institute set up, in parallel to YRRI where instead of selecting for age, one selects for 'weird research'. Don Braben might have done this: his Venture Research fellows were not particularly young. His younger fellows were not more successful in any obvious sense compared to the older ones.

At that point we are back to the null hypothesis: "fund good people doing good work" regardless of whether they are old or young. But to this end, is it better to have dedicated institutions? For example, in the current system, implicitly one has to be old to start a lab by default unless one makes the cut for a small number of special fellowships and grants every year. If one believes there are many bright students that could be running their own labs and doing great research and that they are currently not able to do so for lack of opportunity, then starting a YRRI would make sense. Whether a given person will start a YRRI will be heavily conditioned, I expect, by their personal knowledge of specific young scientists they want to empower, as opposed as being swayed by abstract arguments about why it may be a good idea.

Example 3: The Thiel Fellowship

The Thiel Fellowship, while not an example of an attempt to improve science is a good example of what successful institutional entrepreneurship looks like, and is discussed as such in the N-Q essay.

When they launched back in 2011, the Fellowship did something quite unique. Built on the strong conviction that certain young individuals could be doing something better with their lives by dropping out and starting a company (or writing a book, or doing research), the Thiel Foundation committed to give $100k to each Fellow if they dropped out. They were mocked as a "misdirected piece of philantrophy" and made fun of; it was pointed out that some Fellows ended up pursuing uninspiring goals and that instead of flying cars the Fellows got us caffeine sprays (no joke). Some even went back to school. But the Fellowship also boosted and enabled the careers of people like Laura Deming, Vitalik Buterin, Dylan Field, Boyan Slat, or Austin Russell. 11 years of the Fellowship have costed perhaps 30M$ which by venture standards is not a large amount, especially when compared to what the Fellows as a whole have accomplished. What can we learn from this? Suppose Thiel had decided instead to donate that money to Harvard. Without running an RCT, my hunch is that it wouldn't have had the same impact.

And indeed Peter Thiel (And the early Fellowship team, people like Danielle Strachman and Michael Gibson) did not run RCTs before starting this, nor they probably conducted preliminary scientific research of any sort to design the Fellowship. They noticed a market opportunity: Talent locked in in university that could be unlocked with a small amount of money and a network. Having noticed that opportunity, they designed a program to fix that, and arguably they succeeded.

This I think should be the general playbook for metascientific work; it should rather be institutional entrepreneurship where a concrete issue is noticed and then a fix is proposed and instantiated through the right channels (sometimes policy, sometimes new institutions or reform of existing ones). Importantly, this fix is, as with entrepreneurship, very context dependent, and not amenable to scientific (intersubjective, general, crisp, testable) theorizing.

Are there other Thiel Fellowships one could build?

Lee Smolin, for example, argues that only a small number of theoretical physicists are really pushing the field forward, which arguably is a bit of a hot take. Hot takes is what entrepreneurship is built on, so it's a promising start. He suggests that these few remarkable scientists should get funded to do their research. This would be very cheap! And the results could be transformational.

“But when it comes to theoretical physics, we are not talking about much money at all. Suppose that an agency or foundation decided to fully support all the visionaries who ignore the mainstream and follow their own ambitious programs to solve the problems of quantum gravity and quantum theory. We are talking about perhaps two dozen theorists. Supporting them fully would take a tiny fraction of any large nation’s budget for physics.”

I don't know how I would choose these two dozen theorists. I wouldn't be able to start this institution in particular; I lack the required knowledge and a strong enough belief in Smolin's hot take to actually go ahead and do it. And it's harder to argue why this particular initiative would be the best use of a funder's money, but it seems to me that if we did as Smolin says, we would see more interesting physics results than otherwise, at a small cost to a wealthy philanthropist that's interested in physics in particular. Definitely cheaper than building larger colliders!

Example 4: Diversifying funding

Not present in the N-Q essay but discussed by N-Q at various in-person events is a specific proposal from NIH to cap funding per lab using something called the Grant Support Index, which I discussed here and here. Back in 2017 some research came out pointing to the fact that by some metrics based on citation counts, labs become less efficient as they are bigger. That implies that instead of having a lab with 27 grants, having 9 smaller labs with 3 grants each, which would cost the same, might be a better use of the money.

Should this be done? This is less clear! As I showed in my own post, the conclusion that large labs are less productive depends on how you measure productivity. There are reasons why we may want large labs: if a PI is a good manager and is good at picking and managing good students, then why not let them have their large lab? Once again we are back to picking case-by-case. This is sadly what tends to happen with metascience work: the actionable insights get lots in a sea of on the one hand and on the other hand.

The N-Q essay has a unified conclusion section where they show some tentative optimism about developing metascience into something that can be an 'engine driving improvement in the way humanity understands the world'. If there's one recommendation that recurs throughout the essay is a call for more structural diversity because:

Such structural diversity is a core, precious resource for science, a resource enlarging the range of problems humanity can successfully attack. The reason is that different ambient environments enable different kinds of work. Problems easily soluble in one environment may be near insoluble in another; and vice versa. Indeed, often we don't a priori know what environment would best enable an attack on an important problem. Thus it's important to ensure many very different environments are available, and to enable scientists to liquidly move between them.

However, someone wanting to reform science in some way could read this and still think that it's better to double down on what works, as opposed to more diverse initiatives that might fail. One counter to this reasoning made in their essay is diminishing marginal returns:

One way to think about it is this: if you donate $10 million to Harvard or a similar incumbent, then a reasonable rough model is that that money will go to the best thing Harvard didn't already fund. But Harvard has an enormous budget. And so it will fund something far down Harvard's priority list: something Harvard thought wasn't good enough to fund with its first several billion dollars in spending. By contrast, startup ventures striking out in new directions are doing the best they can in those directions. Diminishing marginal returns haven't yet set in. Structural diversity may be chaotic and confusing, but it also offers a chance to escape the tyranny of diminishing marginal returns, and to enable new kinds of creative work. [...] Perhaps, if outlier successes are what matter, then it's sufficient to have many small but very different cultures. In this view, scaling may not matter so much. Indeed, maybe scale is even undesirable.

But how big should each institution be?

The fundamental question is: what sets the scale for any such program? Is it fashion and fickle perception and politics and highly legible stories? Or is it the scientific contribution, and the value of the program as a part of the wider ecosystem? At present, it is almost entirely the former, and only incidentally the latter. Of course, fashion and fickle perception and politics often do not present as such. They present as enthusiastic articles by well-meaning people in Nature and Science and The New York Times. They present as enthusiastic and brilliant founders of new institutions, with their pet ideas about how things could be better. This is excellent for generating new social processes, but a terrible basis for evaluation. The net result is a natural monoculture, an oligarchy in which there's either far too much or not nearly enough of something. This is a disaster for science.

I worry that making this distinction may not matter much in practice. If one takes, say, the newly launched and well funded Arc Institute, it exists in its current form not because it has shown to be successful, but rather because of the existence of legible stories of success of institutions that preceded it like the Broad Institute or HHMI Janelia, which helped shape its design. We can say that there are reasons to believe it will be successful (if it attracts the right talent), and that this justifies the funding it started with. But after that, evaluating the performance of the institution becomes less clear, for reasons that Nielsen and Qiu enumerate throughout a previous essay.

Just like they (N-Q) had to leverage highly legible stories of struggle and success in science (Kariko's, Church's) to get their points across, so will do future evaluators of Arc's performance do with discoveries awarded highly legible awards like the Nobel Prize. It's hard for it to be otherwise: science has intersubjectivity at its core and if I cannot show to you in a convincing way why my institution or researcher of choice is a good funding target then we're just arguing vibes: As much as the day to day of science, and argueably the practice of entrepreneurial metascience, requires working in illegible gray areas, the practice of metascience qua science requires at least some legibility of the resulting metascientific body of work.

In contrast to the first paragraph I extracted (pro-diversity of approaches), elsewhere in the essay they seem more in favor of flexibility:

However, in the science policy world interventions are often focused on what is practical within existing power structures. We shall be concerned with more a priori questions of principle, and enabling decentralized change, i.e., change that may occur outside existing power structures. For all these reasons we think of the essay as simply part of metascience. [...]

What we want is a flourishing ecosystem of people with wildly imaginative and insightful ideas for new social processes; and for those ideas to be tested and the best ideas scaled out.

Perhaps in their mind diversity and flexibility are closer than in mine. To me arguably what matters more is flexibility: There is only one major search engine (Google) which works well enough compared to the alternatives, but anyone is free to challenge them. One can simultaneously think the system doesn't need that many more experiments in funding structures and that more people should be empowered to make decisions about that. If they happen to decide the same, that's also a valid outcome.

If one takes the Soviet Union, in some old posts I argued the problem with the USSR and in general central planning is not that chiefly private property is abolished; you could imagine someone owning the country outright and they would still have the same planning problems. The problem is that there is no fine-grained and fluid control over resources to let better decision makers make decisions about them (why markets are good). Network effects, as I discuss in those posts, can complicate this basic story, but at first the theory seems right: if we add a few more decision makers to science and they decide that they should fund the NIH instead of a Institute for Traveling Scientists (a proposal in the original text) I don't think this is an obviously wrong choice.

This all said, on net I favor, relative to what we have today, having more independent sources of control of funds and resources in sciences. Maybe in the future we'll have a monoculture that's optimal, but I am not convinced the NIH's is that.

CRISPR vs a Theory of Everything

Perhaps a way to understand where N-Q are coming from and where I'm coming from is this:

Picture a massive warehouse full of slot machines. Each slot machine is a field being funded by a specific mechanism and this warehouse is science as a whole. Each stops working after a while. Each pull of the lever in each slot machine gives you a lognormally distributed payoff. The mean of any given machine that is made anew is drawn from a half-Cauchy distribution (which has an undefined mean and sometimes have very large outliers due to fat tails).

You can choose between copying an existing machine or randomly spawning one into place. What do you choose? This is the same as the choice between doubling down on a given funding mechanism or starting a new one. For different choices of parameters in the toy example above the right choice is one or the other.

In a world where there most of science is ahead of us, it makes sense to focus on exploring whereas in a world where most useful fields are already established, it makes sense to exploit those instead. I'm more of this latter opinion whereas N-Q might argue not necessarily that science is truly an endless frontier, but at the very least that we can't know that and thus we should err on the side of maximizing discovery.

Forcefully: Do you rather want a better CRISPR or the next quantum mechanics? This is probably the crux question. I say obviously give me the next CRISPR. At the same time, it's easy for me to imagine someone with different priorities and aesthetics that will say may I have the next QM please. Here's a quote from Michael showing his aesthetic preference for field-building, as opposed as pushing forth an already "hot" field:

DEVON: When is competition productive and when is it more toxic?

MICHAEL: I think I deny the premise of the question. Toxic, well okay. All right. Let me lean into the question, let me accept the premise of the question.

You can probably all guess — but maybe people listening to this can't — my opinion.[...]

If there's a thousand people all competing to do the same thing, actually a lot of them could go and do other things. They could invent other games to play, some of which would be unique games. Games that only they in the world were playing, and it would both be more meaningful for them, it'd be more meaningful for their family and for their friends, and just better for the world if they were to go and do those other things. [...]

He [Hamming] was a more competitive kind of a person. He wanted to win quite often. Even his framing of asking, "What are the most important problems in your field and why are you not working on them?" That seems to me like kind of a silly framing. It's accepting the consensus social reality of what the field is, when in fact it is much better to figure out what are the problems that nobody in your field has understood are important yet. Sort of trying to invent new problems, and maybe even new fields. That strikes me as a more enjoyable, and ultimately, more meaningful activity. (Nielsen, in Experts gather to talk about the impact of Richard Hamming)

Whereas to the contrary, I find scientific competition extremely valuable. We wouldn't have gotten as fast as we did to gene therapies without friendly (and sometimes not so friendly) competition between the relevant labs. Had they chosen to go do something else that no one was doing we might understand more, but be able to do less. I say might because I can't prove to you that doubling down on a field, as opposed to starting a new one is better. Just that the latter is not a priori better than the former in all cases.

Here's a concrete funding decision one could make. At a cost of $100k per year for two years, would you rather

Fund a promising student to work at Arc Institute on the next step of Hsu's lab work on serine recombinases (who wouldn't otherwise get to work on that)
Pick an unfunded grant to a small lab that received highly variable feedback from reviewers, and fund it.

The pro-diversity heuristic would encourage us to fund the latter. Arc Institute is already well funded, an additional student will do one unit of "Arc Institute work" whereas the second option will do something unusual. How much weight to put on this guiding principle vs looking at this particular decision case-by-base. From an entrepreneurial point of view, there is no making general plans, there is only acting here and now with the means and circumstances available to the entrepreneur.

Institutional design is more constrained than you think

Restaurants as a platonic form, or whether aliens would have PhDs

Source: https://lexica.art/prompt/632bb078-94d8-42f4-9567-a8e05469626b

At the beginning of their essay, N-Q ask whether aliens would also have a NSF or a Harvard. Surely, one may think, they will agree that they should experiment and share results, but would their institutions look like ours? This seems preposterous to them but not so to me! The space of possibilities for institutional design seems more constrained to me than it does to them. In turn, this reduces the value of experimenting with new institutional forms as opposed to doubling down on what we know works, on the margin.

One analogy is this: Most restaurants are very similar. They have kitchens, they have seating, they have processes for procuring groceries and disposing of leftovers and garbage. They hand out checks and they have menus. Some might follow Escoffier's brigade system to structure their kitchen teams, others do not. Some serve gourmet food, others fast food.

The convergence on this concept 'the restaurant' can be explained by underlying facts about human physiology and customs (we need to eat, eating with friends and family, taste for variety), and economics (tradeoff between money and time, economies of cooking scale). Differences in restaurant style can be explained by more context dependent factors (different preferences and purchasing power to be catered imply the existence of McDonalds and the French Laundry) If aliens are in some relevant ways like us, we can predict they too will have restaurants.

The reason for this convergence is not that there is a single restaurant chain and they happen to do things in the same way, squelching experimentation; there are millions of owners of restaurant throughout the entire world. The reason is that restaurants are what best fits our given context. The platonic form of the restaurant emerges out of physics and context just like calcium-potassium-sodium neurons do over and over in evolution.

In parallel with this convergence, there have been innovations: food delivery, cloud kitchens, more variance in menu length , or Japanese-style payment where there is no dance of waiters carrying cards and bills back and forth, there is a single interaction at the exit. This latter has not yet taken over western restaurants, suggesting potential for improvement of the core concept.

Some restaurants are strikingly different: maid cafes, hot pot restaurants where you, rather than a chef, cook your own food, restaurants where one eats in the dark, or restaurants where food is ordered by phone and it arrives in a mini train. I just came up with a new restaurant idea as I was writing this paragraph. Fun, but don't expect it to take over anytime soon.

If one were to open a restaurant, what kind of restaurant should one open? Should you open a spot that does for burgers what Tartine did for bread? Or should you start a quirky restaurant where only foods that start with 'B' like bread, beans, butter, or bananas are used in cooking?

This is part of my reaction when reading the N-Q essay: Monoculture does not necessarily equal stagnation, there can be really good reasons for convergence on a given pattern (as with restaurants) and science is more like restaurants than N-Q suggest.

The second part of my reaction is agreeing with the fact that we are still far from science being fully figured out and that currently there are reasonable changes one could effect to improve it. The reason for this is, to some extent, that decision making in science is indeed very centralized so there's less experimentation than what one would find in a market. And unlike markets, there is no obvious transmission chain that connects institutional performance to norm dissemination. It would indeed be useful if successful institutions could be scaled, but how do we know, scientifically, what a successful scientific institution is?

Natural law as an example of what metascience could be

Social institutions are not directly dictated by the laws of physics. The Soviet Union, for a while, experimented with various degrees of central planning. While, as I argued , they underperformed a counterfactual market economy, the continued existence of the USSR didn't "defy" economics for the same reason a market economy is not implied by the body of work of "positive" economics. What political economy can tell us is that given some goals (like human welfare broadly construed) and some background facts about the way human beings behave, some systems will be more likely to achieve those goals than others.

Studying legal systems could be a fruitful activity for metascientists: like science, law is a complex social phenomenon that exists in every modern society. Unlike science, law has been present almost since the beginning of human history. Every society had to solve similar problems, learning to deal with power and foster coordination, and had the chance to develop different solutions for them.

Today, there are two main system that are widespread, the civil and common law systems. Each system addresses the same problems but they do so in slightly different ways (one heavily involves juries, the other does not). Which is better? Is one (or both) unfair? Are there better legal systems we could have that are not just tweaks on the existing ones but radically different instead?

The legal systems we observe today are the result of centuries of evolution but are also entangled with the particular political histories of the countries where they exist. It's not unreasonable to assume that the systems we have are good enough but also probable capable of improvement.

An exercise one could do is try to derive a legal system from first principles and see how that compares with what we have now. For example, given what we know about human beings, having someone being a judge in a case where they are involved seems like a bad idea because they'll be biased. Because of this, every society has moved towards a principle of third-party adjudication of justice. The exceptions to this have to do with societies ruled by a single individual (like medieval kings) that were exempted from their own laws (though note the exception applied to one person, and most justice was still third-party mediated).

In the book The Structure of Liberty, Randy Barnett tries to do exactly that: deriving an ideal legal system from high level considerations, which (spoilers) ends up being similar to what we have now. This idealized system he calls "natural law". Though I read his book many years ago, the analogy he uses to introduce the concept directly influences the way I think about institutions in general and science in particular:

When we think of the disciplines of engineering or architecture, the idea of a natural law is not so mysterious. For example, engineers reason that, given the force that gravity exerts on a building, ifwe want a building that will enable persons to live or work inside it, then we need to provide a foundation, walls, and roof of a certain strength. The physical law of gravity leads to the following "natural law" injunction for human action: given that gravity will cause us to fall rapidly, if we want to live and be happy, then we had better not jump off tall buildings. The principles of engineering, though formulated by human beings, are not a product of their will. These principles must come to grips with the nature of human beings and the world in which human beings live, and they operate whether or not they are recognized or enforced by any government. And though they are never perfectly precise and always subject to incremental improvements and sometimes even breakthroughs, they are far from arbitrary, and we violate them at our peril.

Besides Barnett I have also read in the past examinations of very different legal systems; some examples here being Rules for a Flat World, the Invisible Hook (on pirate institutions) or most comprehensively, Legal Systems Very Different from Ours.

One lesson from that is that the ideal legal system may sometimes depend on context. Rules for a Flat World describes a case where a developing country tried to copy the legal system from the US. That didn't go well, it was too complex for what their society could support. Likewise customary justice administered by village elders, without lawyers, continues to exist because some societies can't economically justify dedicated lawyers or judges.

But if a society is sufficiently like ours, Barnett would say, then they would end up having a system that is like ours.

What are the implications for metascience? Legal systems have tended to converge over time, with diversity of approaches going down, without the need for randomized control trials while dealing with complex social phenomena, though with plenty of learning by doing as legal systems get stress-tested case after case. By analogy, I'd argue, we can reason similarly about metascience and see what science could look like given the objective of producing useful and/or interesting knowledge. How much do those goals, and the way human beings are, constrain scientific institutions?

Would aliens have PhDs?

Or, if we could re-run human history to the current year, would the institution of the PhD re-emerge as we know it today?

Probably.

I asked Twitter this question, inspired by a line in the N-Q essay where they wonder:

Imagine you're a science fiction author writing a story depicting a scientific discovery made by an alien species. In your story you show the alien scientists up close – how they work, how they live. Would you show them working within institutions resembling human universities, with the PhD system, grant agencies, academic journals, and so on? Would they use social processes like peer review and citation? Would the alien scientists have interminable arguments, as human scientists do, about the merits of the h-index and impact factor and similar attempts to measure science?

Almost certainly, the design of our human scientific social processes has been too contingent on the accidents of history for the answers to those questions to all be "yes". It seems unlikely that humanity has found the best possible means of allocating scarce scientific resources! We doubt even the most fervent proponents of, say, Harvard or the US National Science Foundation would regard them as a platonic ideal. Nor does it seem likely the h-index and similar measures are universal measures of scientific merit.

This doesn't mean the aliens wouldn't have many scientific facts and methodological ideas in common with humanity – plausibly, for instance, the use of mathematics to describe the universe, or the central role of experiment in improving our understanding. But it also seems likely such aliens will have radically different social processes to support science. What would those social processes be? Could they have developed scientific institutions as superior to ours as modern universities are to the learned medieval monasteries?

To me, this does not seem likely! Note that to them, like to me, it seems plausible that they would have mathematics or they would do experiments. Why would aliens do this? Of course, because given certain ends, this works better than the alternatives; philosophers tried hard to think about how the universe works and while that got us far (I'd argue Einstein's fanciful speculations about falling in elevator are very philosophical!), it wouldn't have gotten us to the standard model by themselves.

But on the light of what I've seen for legal systems in the past, I am more inclined to disagree with how radically different these other institutions might be. I am also more inclined to hold this opinion because, while not a historical materialist (ideas matter too!), perhaps my reading of economic history makes me appreciate the impact of background material conditions in the development of social institutions. Now we can ask: Are modern scientific institutions superior to learned medieval monasteries?

The relevant comparison is the modern university to the monastery. The former is an intellectual descendent of the latter, in some cases occupying the same buildings millennia after as with Oxbridge. The purposes were different, so arguing that the modern university is better at science than a monastery is like saying that a car is better for transportation than a sack of potatoes. Unequivocally true, but also a clearly unfair comparison!

For example, medieval monastic schools had regular prayer and universities don't. This practice wasn't there to help advance science, it was there to glorify God, which modern universities don't have the mandate to do. Monasteries and their schools were more focused on teaching than on doing research. We could even say that research was a byproduct of their core activities.

The concept of the PhD is one that we could trace back all the way to guilds and master-apprentice relationships. The modern PhD involves a period of comparatively little pay, intensive learning guided by a mentor figure, and the completion of a project or series thereof that merit the granting of a degree.

In some countries PhD students teach and in others they do not; in some professional fields outside of academia PhDs are socially required (As in biotech) and in others they are not (software engineering), but the basic institution is very similar.

Abstracting away, when asking whether there would be PhD students among the aliens, we can assume that aliens very likely age and are born without much knowledge. Just like us at some point there is a smart young alien full of energy and devoid of knowledge and at a later point there is a practicing alien scientist. What happens in between? Well surely they are learning something. Learning by doing is a particularly effective way of doing something, as opposed to only reading about it (how different would their brains have to be for this not to be true?). Tacit knowledge would most likely also be present there, leading to the need for personal transmission of knowledge. At that point we have hand waved our way to a proof of the necessity of something like the PhD system.

We could imagine students being paid more, that would be a marked departure from the way the PhD system works. Why is academia not like software engineering, where interns are paid significantly more? Here one could point to one of the bottlenecks identified in the N-Q essay: there's no natural feedback loop driving growth in new institutions. If the research one is preparing for is disconnected from direct application, then it's hard to pay for in a self-sustaining way as companies are able to, which makes science reliant on self-funding (as in the early gentleman scientists or endowments) or charity (public or philanthropic funding), leading to difficulties paying more.

Meta-entrepreneurship as a ceiling for what meta-science can do

Entrepreneurship, especially the subset of it that is Silicon Valley-style venture-backed startups, has a lot in common with science. As with science, it exists in a continuum between relatively well-established business models (opening the nth restaurant; scaling the latest -omics technology) and speculative businesses (launching a hitherto academic database into the world; inventing expansion microscopy). As with science, one can point to high level descriptive trends about the field: One can say, for example, that on average entrepreneurs that are a bit older tend to succeed more often. One can also point to seemingly successful institutions like YC (The HHMI of startups?) and as with meta-science one can find debates on whether the accelerator itself is to be credited for the success of its startups or whether on the other hand they are just picking winners that would be successful regardless.

One can also find stories of funds using algorithms to try to predict what companies they should invest in, and finding that a few years after they abandon them. This hasn't been tried at scale in science, but I suspect if someone tried to fund authors based on how many citations they get, they would abandon that algorithm soon enough.

Unlike science, however, entrepreneurship has the profit motive going for it: there are strong economic incentives to invest in the right companies, start the right companies, find the right ways to hire, manage, and fire. Unlike science there is a clear reward signal to optimize for: is the company making money?

In meta-science, we have metascience books, and we have books too in meta-entrepreneurship. We have biographies of both scientists and founders. Meta-entrepreneurship seems more advanced in having books for all the sub-skills involved: how to fundraise, hire, manage. There does not seem to be a similar number of those books for science.

Despite this, there is no agreement on questions like should you have a cofounder for your company. If one looks at the data in the aggregate through a statistical sense, the received knowledge that one has to have a cofounder doesn't seem true. From time to time I read articles from founders on why actually solo founding is the way. What should one think here? Maybe some people would do better with one and others without.

If we can't solve this can we hope to learn much about meta-science? This is, I think, the main argument against an expansive vision of meta-science. Good founders find their way just as good scientists find theirs.

What should metascience entrepreneurs do?

At first, I thought that "marginal improvements" that just save time wouldn't be that interesting to think about. Doing the same thing but faster or cheaper doesn't seem like how one gets to interesting new thoughts. But if those thoughts are a function of primarily time spent thinking, then buying time is how one gets more of thoughts: scientists spend a lot of time not only applying for grants, but also pipetting, cleaning lab equipment, and moving samples in our out of instruments. Spending more time reading papers, or doing analysis seems like a better use of their time. Whether we can do better than buying more time is something that we could figure out by surveys of how specific discoveries came to be, for example taking Nobel-winning research, or by talking to scientists directly. (Applied positive meta-science)

Marginal improvements are actually great when compounded. By this I mean taking tasks that we already know how to do, and for which we know what good work looks like, and making that easier or faster. Replicating existing work is one example. But also, as I suggest in that post, anything that will directly buy more time for scientists. We can measure time, and we know science requires time: we can buy more science by buying more time (duh, right?).

I think there is a lot of promise to this! Something as simple as scientists strapping GoPros to their foreheads and sharing how they work, amusing as it may seem, could be very useful for training purposes. I would fund a project aiming to do this immediately! Then one could look at what scientists given access to the recorded library of examples think. One could maybe count the number of days that experiments used to be delayed because someone had to come teach a new skill before/after having access to this. Other ideas I find useful include documenting how stuff happens at labs (How do research questions get decided, how do meetings tend to go, how are reagents managed?), replicating research in areas that are suspected of being in bad shape (like cancer research or Alzheimer's research), starting FROs where they make sense, or maybe funding the Smolin fellowship.

Crucially, I think this should all be done by the same person or team with an entrepreneurial mindset: going out and finding what problems need solving, then doing the research scoped to what problems can be solved by that team, and then doing that. As opposed to there being people that produce research but do not act on it.

Conclusion: What metascience should aspire to be

Metascience is a broad tent that fits pure academics, para-academics, ex-academics, and people with a blog on the internet. A subset of this group has gone to reform and start new institutions while another subset studies quantitatively how science works. We could also point to a third, lesser developed and funded area of qualitative meta-science (Think Latour or Harry Collins). The first of these three groups is different from the second two in that they include among their aims to change the way science works. The second two aim to understand aspects of it. In some idealized scenario there is a strongly bidirectional connection between theory and practice. In practice there hasn't been much of one so far. Empirically what I have observed in the space of institutional engineering is stylized observations and personal anecdotes conveyed 1:1 helping shape reforms and founding efforts. These so far have not come coupled with a strong will to study the new efforts themselves. At Rejuvenome I published some blogposts in the Astera blog about decisions the project made, but to this date there is no Arc-hivist at Arc Institute, Arcadia, or Convergent Research. Just like one rarely see economists or historians studying how new startups work.

The failure of meta-entrepreneurship to establish deep links with entrepreneurship, given stronger incentives for improvement, makes me be pessimistic about the possibilities of these bidirectional linkages from manifesting in metascience. Hence I predict metascience and metascience entrepreneurship will continue walking separate paths: The next big NIH reform or new institution started will not be strongly influenced by academic or theoretical metascience.

In 'qualitative metascience' there is a rich body of work to be developed about how to fund science or conduct research. Startups have Getting Things Done or High Output Management or The Great CEO within and a plethora of supporting essays and articles about fundraising, managing, or marketing. Startups do make use of this hard-earned knowledge through experience. What does science have that's comparable?

I'd say we have examples of this. These, I think, can be helpful for scientists, funders, and policymakers in the same way the startup books are for founders. These works are not scientific, but they are the best we can do.

Warren Weaver's handbook for Rockefeller foundation officials
Lee Smolin's and Hossenfelder's complaints about modern physics research
Ramon y Cajal's Advice for a young investigator
The FRO whitepaper
Polya's How to Solve It
Root-Bernstein's Discovering
Ben Reinhardt's DARPA and PARPA reports
Bottleneck analysis
And many others, on how to become a professor in the current system.

It is the production of this sort of work where (theoretical) metascience can add the most value to (applied) metascience entrepreneurs.

(Thanks to Midjourney)

I intend this post to be my last one on metascience, at least for a while. I expect that if I return to this topic, it will be whitepapers for concrete institutions that need to be built, or analysis of specific fields and technologies.

Citation

In academic work, please cite this essay as:

Ricón, José Luis, “Limits and Possibilities of Metascience”, Nintil (2022-12-05), available at https://nintil.com/metascience-limits/.

Limits and Possibilities of Metascience

Table of Contents

Summary

Introduction

Some case studies of metascientific entrepreneurship

Example 1: The replication crisis as a metascience success story

Example 2: Empowering young scientists

Example 3: The Thiel Fellowship

Example 4: Diversifying funding

CRISPR vs a Theory of Everything

Institutional design is more constrained than you think

Restaurants as a platonic form, or whether aliens would have PhDs

Natural law as an example of what metascience could be

Would aliens have PhDs?

Meta-entrepreneurship as a ceiling for what meta-science can do

What should metascience entrepreneurs do?

Conclusion: What metascience should aspire to be

Citation

Backlinks