Go One Level Up (And Do The Economics When You Get There)
Tyler Cowen asks when the research paper will disappear in economics. Think about it, he says: AI can now generate, evaluate, and improve papers. Ergo, the paper is no longer the scarce unit of intellectual contribution. You can write as many papers as you want, and each of those can have as many variants as you want.
So what becomes scarce in such a world? The system does. The system (or the box) has components: the dataset, the code and the method of analysis. The code and the method are now solved problems, or close to being so. So folks who can figure out innovative ways to collate interesting, novel and high quality datasets will become scarce. Tyler himself points to the possibilities of synthetic datasets being generated with the help of AI, but doesn’t ask the obvious question. If this task is repeatable and verifiable, it can be automated. Which brings us back to square one.
It is really hard to do, and it does give you a bit of a headache, but there is a helpful heuristic when it comes to thinking about AI. As roon put it on Twitter recently: in the age of AI, whatever level you’re thinking at, go one level up.
Tyler went one level up from the paper publishing game as it is played today. But the reason this heuristic gives you a headache is that it applies recursively.
But here is exactly where principles of economics come into play: each time you go up a level, you don’t just get a new abstraction. You also get a new set of incentives, costs, time horizons, and choice sets for everybody involved. You get, in other words, to update your view of the world. Not because the principles have changed, but because the circumstances to which they’re being applied have changed.
Declaring that we’ve moved to a higher level of abstraction is easy. Doing the economics of what that level actually looks like is harder, but more useful.
The Market Has Already Spoken
Before we go climbing levels, let’s look at the data. Arin Dube points out that the advent of LLMs hasn’t raised the number of NBER working papers above trend. Nor submissions to top journals. His explanation: LLMs substitute for good RAs, but not for good ideas. And RA labour supply was never the binding constraint on economic scholarship. By the way, I’m willing to give credence to the idea that the academic world is just that slow, and again, principles of economics help us understand this. The academic profession isn’t incentivized to do this, and for many different reasons (shooting themselves in the foot would be chief among them)
Tyler’s post implicitly assumes that the bottleneck in economic knowledge production is in the execution phase: writing the paper, running the analysis, doing the robustness checks. If that were true, LLMs should have already caused an explosion in output. But they haven’t, and you’d have to update your priors a little bit at least about the fact that the bottleneck probably lies elsewhere.
The market for academic papers is talking, and it’s telling us something important. The constraint is upstream of the paper. It’s upstream of the system that produces the paper, too. Building a better system is essentially building a more sophisticated RA. But as Arin Dube points out, this excellent RA still needs someone to tell it what to investigate. Replacing academic papers with systems is essentially proposing to upgrade the RA layer even further. But Dube’s data suggests that upgrading the RA layer seems to not matter at the margin.
So what is the binding constraint? Let’s go find out.
One Level Up, Then Another, Then Another
Let’s try an example of thinking one level up. Here’s one of Tyler’s ideas in his blog post: “How about ‘I am Tyler Cowen, what is it you think I will find interesting in this data set?’” How might you go one level up here?
How about: “I am Tyler Cowen, what data set should I be trying to get, given the most interesting questions I’m thinking of at the moment?”
Remember, this is recursive. How does one go one level up from here?
Try this on for size: “What interesting questions am I not asking that I should be? Use what you know about me to formulate your answer.”
But remember, each change in level also changes what is scarce, what one’s incentives will be, and associated costs and time horizons. At the first level (analyzing the dataset level), the scarce skill is statistical technique: the ability to extract findings from data. At the second level (choosing the dataset level), it’s judgment about data acquisition — knowing what to look for and where. At the third level (what are my unknown unknown levels), it’s imagination — the ability to generate questions that wouldn’t occur to someone embedded in their current paradigm.
And as we stated earlier, each level has its own economics. Technique is a skill that is expensive to acquire but it has the benefit of being durable: you only need to learn the basics of causal inference once. As opposed to data acquisition, which has ongoing costs: every new dataset requires negotiation, cleaning, contextual understanding. But what of imagination? I’m not even sure how to measure it, and cultivating imagination looks nothing like traditional education. It is about cultivating a disposition or an inclination. It is about inculcating the habit of asking weird questions, and about stubbornly refusing to accept a given framing.
Think of LLMs as a massive supply shock at the technique level. But an infinite increase in the ability to do analysis doesn’t automatically increase the ability to generate interesting ideas in the first place. Or, if you prefer economist speak: the production function for economic knowledge seems to have near-zero elasticity of substitution between technique and imagination. We’ve flooded the market with automated RAs and nothing much seems to have happened. Why? Perhaps because RAs were never the scarce input to begin with.
I’ve written before about asking my students five random questions at the end of every class. About anything, as long as it’s not about the topic we just discussed in class. Here are my all time top three: why do cockroaches flip over when they die? Melody itni chocolatey kyon hai? Were Ross and Rachel on a break?
The practice sounds like a pedagogical trick. But one way to think about it is that it’s training for the only skill that Dube’s data says matters: the ability to generate questions that the existing paradigm doesn’t hand you.
The cost structure is interesting. Imagination might be the cheapest skill to exercise: it’s free to ask a weird question. But it is among the hardest to cultivate. Midway through each semester, I have to tell my class that I will not leave until at least five questions have been asked. Trust me on this. This isn’t about fancypants labs, or jazzy classrooms. It’s about cultivating environments that reward curiosity: classrooms that invert the syllabus, communities that value questions over answers, and about cultures that treat “I don’t know, let’s find out” as a high-status move. Creating that culture, especially in the world in which we live today, is hard.
And it doesn’t scale the way AI does.
The Journal Problem
Tyler asks: “Do we even need the AER any more to certify which are the best papers? Just ask the AIs.” And later on in the post he says, “What if you submit to a journal a data set and some code?”
But do our usual thing: who needs the journal?
The journal has always solved two problems simultaneously: quality certification (”this is good work”) and attention routing (”therefore, read this”). In a world where humans couldn’t read everything, bundling these two functions made sense. The AER’s authoritative stamp told you what to pay attention to and that it was worth your time. Quality and signaling, two for the price of one.
AI unbundles them. Quality certification can be handled, say, by a council of frontier models. Such a council can rate papers, check replicability, assess influence. And it might perhaps be better at this than three overworked referees with their own agendas. But they are already a gazillion times faster, and not that much worse in quality, if at all.
But attention routing is a different beast altogether.
In the current world, attention routing is a centralized bureaucracy. There’s a clear hierarchy — AER, QJE, Econometrica — and everyone ‘Schelling Points’ around it. This is a well-trodden path, and we know what I’m going to say next: publication delays, conformity pressure, the distortion of research agendas toward “publishable” questions, yada yada yada. But the benefit is legibility. When a hiring committee looks at a CV, they can read it. When a graduate student looks for important work in trade theory, they know where to start.
Imagine you live in a world where your AI agent reads everything and filters it for you. You don’t have to imagine this, by the way. You are in that world, you just don’t know it yet. Quality certification is a solved problem in this world: your agent will do it for you. How do you distinguish a bad arxiv paper from a great one? Simple, ask your agent to evaluate it for you. And given the urgency with which publishing happens on arxiv, human editors at top-notch journals are going to suffer in comparison on the speed-quality Pareto frontier. Why bother waiting for a costly and delayed signal from the top journals, if you can read excellent research that lives at the cutting-edge of your field?
And if journals are going to see a fall in status (if speed and quality in publications matter, what do you think is going to happen to the status of journals?), how do we judge the quality of a researcher? How does a hiring committee in a university evaluate a researcher when there’s no shared prestige signal, just millions of personalized feeds? (Are you tempted to ask “Who needs the university?” Congratulations, and welcome to Headache Hotel. But that’s a whole other story.)
The good news is that the researcher’s choice set expands. You can publish anything, anywhere, in any format. But the bad news is that the incentives become bewildering. You’re no longer optimising for “what will impress the AER editor.” You’re optimising for reach across millions of personalized filtering mechanisms. That’s a very different optimisation target, and nobody knows how to think about it.
The cost structure flips too. Producing research gets cheaper, because the AI genie helps. But building a reputation might get more expensive, because there’s no single ladder to climb. In the current world, one publication in the AER buys you legibility across the entire profession. In the new world, legibility is fragmented. You might be famous in one cluster of agent-curated feeds and all-but-invisible in another.
The Lucas Critique, All the Way Down
Tyler suggests publishing “a method for simulating human behavior, to run AI-simulated experimental economics.” Build the system and don’t worry about the paper.
But go one level up, and you run into the Lucas Critique in a form that should make us quite uncomfortable.
The Lucas Critique says that models estimated under one policy regime break when the regime changes, because agents adjust their behaviour in response to new rules. It’s a warning about the instability of empirical relationships when the rules of the game change.
But think about this: universal AI assistants are a paradigm change for human decision-making itself. Every economic agent now has, or soon will have, a reasoning engine in their pocket. Their choice architecture is fundamentally altered. Price comparisons that took hours take seconds. Heuristics that marketers could count upon will no longer work as well. Contract language that was opaque becomes transparent. Negotiation strategies that once required experience can be generated on demand.
So: any behavioural model estimated on pre-AI humans is already suspect. The price elasticities, the risk preferences, the heuristics and biases — all of these were measured on humans making decisions with their own reasoning capacities. The new agent is a human-AI centaur, and we don’t have stable estimates for how that centaur behaves. Agents will have no problem thinking one level up, I assure you. You could tell me that variance will go down because of this, and I might agree with you. But you could also tell me that variance will go up, and well, who knows for sure?
In any case, the time horizon for any empirical finding shortens. Your carefully estimated parameter was valid when humans were computing intuitively. Now they’re asking Claude. How long does your estimate last? A year? Five? The cost of maintaining a working system of behavioural simulation isn’t just computational. It is also the cost of keeping up with a moving target. Every time the AI improves, the individual agent changes, and your model drifts.
The Lucas Critique in the age of AI says: the system needs to be rebuilt constantly, because the centaurs inside it are constantly updating their beliefs and therefore their actions. That’s not an argument against systems per se — it’s an argument that the system is a treadmill, not a monument. And the economics of treadmills is very different from the economics of monuments.
The Hayekian Disclaimer (Which Is Also the Point)
Tyler asks whether tenure should be given to folks building systems instead of writing papers and/or folks who can build capabilities.
You know the drill by now. What is tenure for in a world where time-to-insight has collapsed? Tenure is like insurance. It gives the professor a floor, or an income security. Think of it like the premium paid by the university in exchange for the option value on long-term research. If the long term gets shorter, the option value drops. And if building capability is going to be the scarce resource that we think it is going to be, those folks aren’t going to be queuing up for tenure. Again, in this world, what is university for?
But working out the new equilibrium for academic employment and institutions in the education sector is a prediction about institutional design and culture, and I don’t have one. Particularly in these times.
I can’t tell you what the new world looks like, and what steps we will take to get to that world from this one. Nobody can. That’s a Hayekian point — the new order for knowledge production will emerge from millions of individual adaptations, not from a blueprint. What I can do is help you think about the transition using principles of economics. At each level of abstraction, ask: what are the constraints? The choice sets? The incentives? The time horizons? The costs? These frameworks of analysis remain the same, even when everything they’re analysing is in flux. They help us understand the territory, even if they can’t draw us the exact path.
Which brings me back to Arin Dube’s finding. The market has already told us where the binding constraint is. It isn’t execution. It isn’t even systems. It’s the question — the weird, generative, paradigm-breaking question that no amount of RA labour, artificial or otherwise, can substitute for.
I’m making my way (very slowly) through the Baroque Cycle trilogy by Neal Stephenson. And one thing I’ve learnt by reading those books is that the Royal Society’s motto is Nullius in verba — take nobody’s word for it. For 350 years, that has been a privilege of the few - the membership of the Royal Society. But if you think about identifying with the society’s founding values, that has now become a far easier club to break into.
The tools for inquiry are now available to anyone with a browser and the disposition to ask. We can all be fellows of the Royal Society in spirit. Why, some of us have done important work in dog cancers with a browser and an intense incentive to ask. But “can all be fellows” is not the same as “will”. The AI genie is out of the bottle, yes — but the genie grants wishes, and the hard part has always been knowing what to wish for.
If you aren’t asking weird questions, you aren’t learning. And if you aren’t doing the economics of the weird question, you aren’t thinking like an economist.
Ask weird questions. And then analyze them using principles of economics.


A wonderful post. The binding constraint has always been questions. When I was a graduate student, the very eminent (and curmudgeonly) Bryce DeWitt used to always say that the most important thing in research is having taste: asking the right question. I would not, however, be so sure that with continued development, AI would not develop taste. It does not quite have it now--but it could. It "runs" on the same laws of physics as we do, so there is no law stopping it from developing taste.