Coase, Alchian, Demsetz, and the Economics of Training Away Your Own Scarcity

The $200/Month Monitor, Explained

Mar 20, 2026

Miles Brundage recently worried that Anthropic has “an org-wide case of AI psychosis”.

In plain English: they’re shipping features faster than they can notice what’s breaking. Seth Lazar piled on with a specific example: Opus now defaults to a million-token context window, and you can’t even opt back down to 200k, even though performance degrades as context grows. (Source: TheZvi’s Substack)

Here’s what caught my eye. Lazar is saying: I want the product to be worse in one dimension (smaller window) so that it works better in another (accuracy). In olden, pre-AI times, I would have said this is an example of preferences on the part of a (human) agent. These days, we call it taste and judgment.

But hiding behind this is a rich way to help economics students understand the world we’ve entered. If you’re learning microeconomics right now, or have struggled with it in the past, you might enjoy this essay.

The reCAPTCHA Move

Here’s the first analogy I’d use to help you start to think this through:

In the mid-2000s, Luis von Ahn had a problem and an insight.

Problem: millions of books needed to be digitised, and OCR software kept getting the hard words wrong. Insight: millions of people were already solving CAPTCHAs every day to prove they were human.

Lightbulb moment for Luis: What if you made the CAPTCHA be the hard word?

So he built reCAPTCHA. You’d see two words: one the system already knew (to verify you were human), and one it didn’t (to get you to do free OCR work). Users thought they were logging in. They were actually labelling training data. Google bought it for an undisclosed sum, and it went on to digitise the entire New York Times archive and large chunks of Google Books. Fun story, if you want to read the full thing.

Here’s why this gladdens an economist’s heart: Von Ahn didn’t pay anyone to do the work. He didn’t even ask. He made the work identical to something people already wanted to do. The cost of acquiring labels was zero — or more precisely, the cost was already being paid by someone else (the website owner buying CAPTCHA security).

Now think about Claude Code, or Codex. Every time a developer sends a prompt, gets code back, and says “no, that’s not what I meant” — every time they edit the output, reject a suggestion, or restructure what the agent produced — they are generating exactly the kind of signal that makes future models better. This is not a synthetic benchmark signal. This is a real-world, domain-specific signal, from a person who understands what “correct” means in this particular codebase, for this particular business purpose. This is feedback from a person with skin in the game. That’s gold for the AI guys. Or whatever is more valuable than gold. It is hard to keep up these days.

But here’s the bit that makes one’s eyes go all beer-goggly: they’ve got us paying $200 to do this… and we’re the ones thinking this is a great deal!

Von Ahn made the security task identical to the annotation task. Anthropic has made the coding task identical to the model improvement task. Users think they’re building software. They are also, unavoidably, training the model on real-world projects.

But Von Ahn Had It Easy

Here’s where that Von Ahn analogy breaks down, and where the economics gets even more interesting.

Von Ahn’s reCAPTCHA was one-way extraction. The user got access to a website. Google got OCR labels. The user had no “residual claim” on the digitised books, and no reason to care.

Claude Code and Codex are different beasts altogether. The developer gets working software, which is a genuinely valuable output. Anthropic/OpenAI get a training signal. Both parties walk away with something they want. Both parties are, in a real sense, monitoring each other. The developer monitors the model’s output (was this code correct? Was it useful? Is it safe?). The model’s provider monitors, in aggregate, the patterns of human correction (where do users reject? What do they edit? What do they choose to restructure?). And note that it does this across levels. You’re using Claude Code, so is your manager. So is her manager! And Claude Code is learning across all of your interactions with it, on the same project, but at different levels. Hold that thought, because this matters.

But for now, let us open our econ textbooks. A guy called Ronald would have gotten a glint in his eye right now.

I’m referring to Coase, of course, and I’m talking about the theory of the firm here. You using Claude Code or Codex is precisely what makes those tools (and the models underlying those tools) better. To be fair, the direct pipeline from your corrections to model training depends on whether you opted in. And it is true that many enterprise users are excluded entirely. But the subtler signal doesn’t require opt-in. Every thing that you choose to try again, every session that you abandon in frustration, and every editing exercise across millions of users shapes what gets built next. It is not just the explicit corrections that matter, but also what can be inferred from the choices you make, and every thumbs-up (explicit or implicit) that you give. Or don’t give! Everything is a signal. Read Aparna’s blog post on this, I’ll link to it again. Note this sentence in particular from their blog: “Every click, tab, press, accept shapes not only the user experience but improves the model intelligence.”

Anyone who has used a Windows laptop will immediately know and appreciate the point I’m trying to get at: we can all assure you that using Windows did not make it better. You had to raise an issue or a ticket, and eventually, at some point down the line, a new Windows version might remove the bug that broke your heart. Eventually. One day. Maybe.

That is not this world, and all those transaction costs have now been internalized, and not just inside the firm, but inside the product itself. This is Coase on steroids! But we’re deep in the transaction woods now, and for this territory, we need a heavier machete than Coase alone can give us.

Paging Alchian-Demsetz!

A Quick Detour Through 1972

The Patel Bros. recently spoke about Alchian, but about another of his many awesome ideas (Alchian-Allen). But we’re talking about a different Alchian paper today.

If you’ve taken a course in the economics of organisations, you’ve probably encountered Alchian and Demsetz (1972). If you haven’t, here’s the core idea, and I promise it’s relevant.

Imagine two people (or four, if you’re a Friends fan) carrying a heavy sofa up a flight of stairs. The output — sofa successfully delivered — is joint. You can’t easily separate out how much each person contributed. If one person slacks off a little, the sofa still gets there, just more slowly, and you can’t tell who was shirking.

This is the metering problem in team production. When individual contributions are hard to measure, people tend to free-ride. Alchian and Demsetz’s solution: appoint a monitor. Give someone the job of watching everyone else’s effort. And to make sure the monitor doesn’t slack off, make the monitor the residual claimant — the person who keeps whatever surplus is left over after everyone else is paid.

That’s their theory of why firms exist. The boss isn’t the boss because of some mystical authority. The boss is the boss because someone needs to watch the team, and the person who watches the team should be the one who benefits most from watching carefully.

Now apply this to Claude Code.

Who Monitors Whom?

A developer working with Claude Code is engaged in team production. The developer contributes intent, domain knowledge, judgment, taste. The model contributes speed, pattern-matching, and broad knowledge of code patterns. The output — working software — is genuinely joint. Neither party could produce it as efficiently alone.

Alchian and Demsetz (AD) would ask: who should monitor this team? Their answer: whoever has the lowest cost of observing and evaluating the other party’s contribution.

And the answer is obvious. The developer monitors the model. Nobody else can. Only the developer knows whether the generated code actually does what the business needs. Only the developer can evaluate whether the architecture is maintainable, whether the edge cases matter, whether the abstraction is right. This isn’t just checking syntax. It’s taste and judgment — the thing that’s hardest to automate and most expensive to acquire.

So far, so standard. The monitor monitors. Classic AD.

But here’s the twist that AD didn’t anticipate: the monitor is paying the entity being monitored.

In the classic firm, the residual claimant pays workers. Here, the developer pays Anthropic (via a subscription), and then does the monitoring work on top of that. The developer is simultaneously the team-production partner, the quality monitor, and the paying customer.

Why does this work? Because the developer is also the residual claimant on the immediate output. They keep the working code. That’s valuable enough to justify both the subscription fee and the monitoring effort.

But Anthropic is also a residual claimant — on a different residual. They keep the training signal. The patterns of correction, rejection, and editing, aggregated across millions of users, are enormously valuable for improving future models.

So both parties are residual claimants. Both are monitoring. And both think they’re getting a good deal. The metering problem doesn’t just solve itself — it solves itself twice, in opposite directions. Non-zero sum games are the best games to play, and this is a great example. Anthropic/OpenAI win, but so do the users.

One absolutely should invoke Coase and non-zero sum games here, but one needs to go deeper to appreciate the true institutional innovation. This is not “just” clever pricing (20/100/200 dollar plans). This is a structural arrangement where the transaction cost of gathering the high-quality training signal has not just gone lower: it has gone negative. Users pay to provide the signal because the immediate output is valuable enough to justify the cost.

So Who Owns What?

Roon says you should go one level up. That works while coding, but while analyzing, one should aim to go one level deeper. And this is the point where AD stops being sufficient (interested readers should ask their LLMs to talk about Grossman and Hart. Paste the link to this blog post inside your LLM of choice, choose the best model available to you, and say that the blog’s author asked me to ask you about Grossman and Hart. Try it!). What follows is Grossman Hart territory, but without me referencing the actual papers - I’m outsourcing that bit to the LLMs.

AD explains the present equilibrium rather well. But what happens over time? My answer: something uncomfortable is happening.

Every correction the developer makes — every “no, I meant this, not that” — partially reveals their judgment. Not just what was wrong, but how they think about what’s right. Their evaluative schema. Their taste. The thing that currently makes them irreplaceable as the monitor.

There are really three distinct assets being produced in every Claude Code session:

The code itself: The developer owns this. Clear property right.
The iterative process itself: The pattern of accepts, rejects, edits, and corrections. Anthropic captures this (subject to its data policies). The property right here is contractual — it depends on the terms of service.
The implicit rubric: The deeper structure of judgment that the corrections partially reveal, and that stays in nebulous form inside the developer’s head. This is the asset that matters most, and the one whose eventual ownership is most ambiguous.

Most discussions lump assets 2 and 3 together. But they’re not the same. A correction log is not a theory of judgment. “User rejected this function” is not the same as understanding why — what principle of software design, what business context or what aesthetic preference drove the rejection.

The whole race that is being run today is about whether enough feedback from enough people lets the platform understand the implicit rubric. And not just any broad rubric. The specific invisible rubric that is inside the head of the person the model is currently working with. If the model succeeds at this, what does that person do next?

We are, in a real sense, training away our own scarcity.

The Pirsig Question

This is where I want to move away from econ, and go over to one of my favorite authors.

In Zen and the Art of Motorcycle Maintenance, Pirsig makes a distinction that I think matters here. The mechanic who cares about the motorcycle produces better work than the one who doesn’t. And the caring isn’t reducible to any specification or checklist. Quality, for Pirsig, lives in the relationship between the person, the artifact, and the purpose. It’s not a property of the output alone.

AD and Grossman/Hart both assume that what matters can be treated as an economically legible object — contribution, control, surplus, bargaining power. Pirsig asks a different question: is quality the kind of thing that can be extracted, standardised, and transferred?

If yes, then every correction you make is just training data with a lag. Given enough corrections from enough caring humans, the model learns to simulate caring effectively. The market doesn’t require metaphysical replacement. Functional substitution is enough.

If no… that is, if quality exists only in the act of a particular person caring about a particular artifact for a particular purpose — then there remains an irreducibly situated human role that no amount of aggregated feedback can replicate.

I don’t pretend to know the answer. My guess is that the current answer is “yes for most tasks, no for some, and the boundary keeps moving.” But the question about where quality resides matters, because it determines whether the current equilibrium is stable or transitional.

TMKK?

If you’re a developer using Claude Code or Codex: You are the monitor, and you are good at your job. But understand that your monitoring work has dual value — to you (working code) and to the platform (training signal). The more corrections you provide, the more you’re improving a system that may eventually need fewer corrections. This isn’t a reason to stop using these tools. It is a reason to think about what makes you irreplaceable beyond the act of correction itself. What do you, and will you, really own? Choosing which problem to work on? The customer relationship? The deployment context? The private data? The taste that can’t be inferred from accept/reject patterns alone? Something else?

If you’re thinking about AI strategy for an organisation: The Alchian-Demsetz lens tells you that the developer is currently the natural monitor because they have the lowest cost of evaluating output quality. Now, this is true today. But the model provider is also monitoring, in aggregate, across most of your competitors’ developers too, and as we discussed earlier, across different levels inside your own organization. The real question isn’t who monitors better right now. It’s who gets to keep learning from the monitoring. Your developers learn one codebase, but the model learns from all of them - developers, project managers, their managers — and this is across departments, across firms, and therefore across multiple levels of abstraction.

If you’re an economist or a policy person: The property rights question is going to become one of the defining issues of AI governance. It’s not about data privacy in the traditional sense. It’s about who owns the judgment signal that users generate as a byproduct of using AI tools. Today, I don’t think we have good answers to this question. I have had enough trouble in trying to figure out if this is the right question!

Whether this is intentional or emergent, the labs have built a system where users pay to generate the most valuable training signal in existence — real-world, high-stakes, domain-specific human judgment on AI outputs. Von Ahn would be proud.

If this is really the deal on the table, is it a fair one?

Yes, for now, is my answer. The working code is genuinely valuable. The subscription price is reasonable. The monitoring is something the developer would do anyway, because they need the code to be right.

But “for now” is doing a lot of work in that sentence.

I chatted with both Claude (Anthropic) and ChatGPT (OpenAI) to refine my ideas before writing this essay, and Claude suggested a lot of edits, most of which I have incorporated. The irony of using both models to think about who captures value from human-AI interaction, and to make this essay more readable, is not lost on me. But it is more than fair to say that both provided genuine intellectual contributions. Both, presumably, learned something from the exchange. As did I, of course. As did you, by reading this post, hopefully.

Who got the better deal? Ask me again in five years.

Aviral Gupta

Mar 20

The question becomes what do firms care about, which most directly translates to what the boss cares about which currently means shipping things by hook or crook until something catastrophic happens and systems go down which we are already seeing examples of.

Cost versus benefit trade-off is the thing to watch until the acceleration becomes too much for humans to bear. I don't think bosses would care until someone boxes their ears properly.

EconForEverybody

Discussion about this post

Ready for more?