Ethan and Ben Explain Sutton’s Bitter Lesson
Have you heard of the bitter lesson?
Let’s say you are on the left hand side of this island that has been created by Gemini. You need to figure out a way to get to the other side, where better climes and delicious food awaits. Past generations have made the trip, and have learnt that the best way to make the trip is to skirt the volcano, avoid the waterfall, and swim part of the way.
The current generation of people on that island now have access to AI, and they uses their access to AI to build tools that will help skirt the volcano, that will make sure you avoid the waterfall, and a tool that will help you in swimming.
OK, so what is the bitter lesson?
The bitter lesson says that this is the wrong way to go about it. Why assume, it asks, that the best way to make the trip is to skirt the volcano, avoid the waterfall, etc., etc.?
Well, because of generations of our ancestors have learnt over thousands of years that this is the best way to do it, you might reply. Plenty of folks over time have tried other routes, but trust us on this, this is the best. Sure we can make the path easier by taking AI’s help on each stage of our journey, no problem at all.
But our journey must comprise these stages, and in this order, and that’s just how it has always been, and that is how it will always be.
The Bitter Lesson
The bitter lesson says this is just flat out wrong.
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
That’s the opening paragraph of a lovely essay by Rich Sutton in 2019, and the title of the essay is “The Bitter Lesson”.
Just in case you missed it, because he does describe the bitter lesson in the first sentence, and gives it the title in the very last one:
General methods that leverage computation are ultimately the most effective, and by a large margin.
Most effective in comparison to which other methods? Well, those that do not leverage computation.
Understanding the Bitter Lesson
What do they rely on instead? Well, human intuition, heuristics, and a thicket of hypotheses. But instead of talking in generalities, let’s take a look at a specific example, covered by Rich Sutton in his essay, and by Ethan Mollick in a more recent one (the excerpt is below).
Let’s talk about chess:
Computer scientist Richard Sutton introduced the concept of the Bitter Lesson in an influential 2019 essay where he pointed out a pattern in AI research. Time and again, AI researchers trying to solve a difficult problem, like beating humans in chess, turned to elegant solutions, studying opening moves, positional evaluations, tactical patterns, and endgame databases. Programmers encoded centuries of chess wisdom in hand-crafted software: control the center, develop pieces early, king safety matters, passed pawns are valuable, and so on. Deep Blue, the first chess computer to beat the world’s best human, used some chess knowledge, but combined that with the brute force of being able to search 200 million positions a second. In 2017, Google released AlphaZero, which could beat humans not just in chess but also in shogi and go, and it did it with no prior knowledge of these games at all. Instead, the AI model trained against itself, playing the games until it learned them. All of the elegant knowledge of chess was irrelevant, pure brute force computing combined with generalized approaches to machine learning, was enough to beat them. And that is the Bitter Lesson — encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than any human.
Go look at the last line from the excerpt from Ethan Mollick’s essay:
Encoding human understanding into an AI tends to be worse than just letting the AI figure out how to solve the problem, and adding enough computing power until it can do it better than a human.
So no, it is not necessarily true that our journey on that island must comprise of these stages, and that these stages must be in this predetermined order. All of what we have learnt over time, including how to learn, may not be the best way to learn, or the best things to learn.
Those New Taxis
Ben Thompson makes the same point in his essay from last October about Waymo and Tesla.
Waymo’s method of autonomous driving is about having extremely expensive equipment be bolted on to autonomously driven cars, with a remote human ready to take over in case it is necessary to do so. Tesla’s method is to generate/record an insane amount of data, and have AI learn from that data.
So which approach is more likely to win?
Better equipment that helps you take correct decisions when live? This is Waymo’s approach.
Better data and more compute during training that helps you learn how to take correct decisions when live? This is Tesla’s approach.
The Sobering Implication
Go back to that island that Gemini drew up for us.
If you have to get from one side of the island to the other, do you figure out a path, and build AI tools to help you during that path? This would be the Waymo equivalent.
Or do you give a lot of data and compute to the AI, and tell it to come up with the best and easiest path for you to take, such that those AI tools are no longer necessary? This would be the Tesla approach.
And Sutton’s point is that Tesla’s approach is not just better, but in the long run, we actually make it worse by telling AI how we would solve the problem.
This is remarkable, and worth internalizing. Giving AI the “seed” of how a human would start to think about the solution to the problem isn’t just not helpful… it is actually harmful(!):
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
TMKK?
This is how Ethan ends his essay about how to best restructure organizations in the age of AI:
We’re about to find out which kind of problem organizations really are: chess games that yield to computational scale, or something fundamentally messier. The companies betting on either answer are already making their moves, and we will soon get to learn what game we’re actually playing.
What he is saying is this:
Are organizations created by humans so impossibly complex that AI will never be able to get a grip on them? That is, AI will not be able to “solve” organizational complexity the way it could solve chess.
In the context of the little island that Gemini drew for us, there are an infinite number of potential ways to get to the other side of the island. And so brute force and computational power will never be able to come up with a better way than what multiple generations have evolved as an answer. Maybe that’s true of organizational complexity also. And Waymo is certainly betting on it being true for the task of truly autonomous driving.
But what if they’re wrong? What if these problems become amenable to optimization with AI? And not by refining the optimization path that humans have chosen, but by choosing to go down an entirely new path, like AI did with chess?
What if this is true for discoveries in biological research? What is this is true for international finance? What if this is true for provisioning of services by the local government? What if this is true for <insert your choice here>?
We’re about to find out the answer to all of these problems in very short order. Buckle up!
P.S. I loved this little line in Ethan’s piece:
If AI agents can train on outputs alone, any organization that can define quality and provide enough examples might achieve similar results, whether they understand their own processes or not.
Robert Pirsig says hi, and that gives me some hope.