Monty Hall on Richard Sutton (On Dwarkesh's Podcast)
First off, watch the whole episode. It is, as they say, a banger:
There are three major reasons for knowing who Richard Sutton is:
He’s the guy who came up with the idea of Temporal Difference1
He’s the author of The Bitter Lesson
Today’s blog post is worth reading only if you are curious about why he’s so convinced that LLMs are a dead end
Here’s my explanation for why you should be curious about his opinion on LLMs:
As Dwarkesh explains early on in the podcast, Sutton is one of the OGs in AI research, and he has the field’s equivalent of the Nobel Prize. This is a person who’s opinions on AI are worth thinking about2.
He’s got a contrarian take on the technology du jour in AI.
Think about it this way: it doesn’t matter much what I think about T20s and the future of cricket. But if Sachin Tendulkar says that T20s are not the best way to think about the future of cricket, I’ll want to know more.
So What Does Richard Sutton Have To Say About LLMs?
Read the first 23 minutes of the transcript, or watch up until that point (I did both in parallel).
To me, the crux of the argument is this bit:
…some people will say that imitation learning has given us a good prior, or given these models a good prior, of reasonable ways to approach problems. As we move towards the era of experience, as you call it, this prior is going to be the basis on which we teach these models from experience, because this gives them the opportunity to get answers right some of the time. Then on this, you can train them on experience. Do you agree with that perspective?
And the short answer is that no, Richard Sutton does not agree with that perspective. The rest of the twenty minutes after this point give you an explanation for why he doesn’t agree with that perspective.
In what follows, I’ll attempt to give you my understanding of their debate, why it matters, and what my own take on the issue is. You should, of course, form your own take after going through all of your “homework”.
What Are LLMs?
Don’t worry, I’m not going to put you through yet another explainer on what LLMs are from a technical perspective. This is not 2022. No, we will instead go with the super simple explainer:
We throw an insanely large amount of human generated text at a very powerful computer, and we tell it to get better at spotting patterns in this mountain of human generated text. Let’s call this part of the process “baking the cake”. Once the cake has been baked, we apply some frosting on top. We do this to make the cake better. And then we send the cake out into the world for folks to consume.
In this analogy, baking the cake is what AI scientists call the training process. Applying the frosting is called reinforcement learning. And folks consuming the cake is called inference.
Go over the excerpt that I said is the crux of the argument with my analogy in mind, and see if it makes sense. “Imitation Learning” is the way to bake the cake, and “Then on this, you train them on experience” is the frosting.
The reason it is called imitation learning is because these models learn to imitate what humans have produced over millennia. And training them on experience is what the industry called Reinforcement Learning with Human Feedback.
And again, Richard Sutton is saying no, this is a bad way to bake a cake called Artificial Intelligence. You should check out the rest of the transcript (at least until the 23:00 minute mark) to get why he disagrees, and what Dwarkesh’s take is on Sutton’s disagreement.
My Way Of Thinking About It
Imagine that all of humanity has spent their entire lives in a cave. When I say all of humanity, I mean every single human life, at any point of time in our history. And when I say cave, I am speaking of an infinitely large cave. So large, that all of us put together will never be able to explore all of it. It’s just too large for us to traverse all of it, and that’ll always be true3.
Along with us in these caves are all the other animals and plants that are (or have ever been) on this planet. They too, like us, explore the cave endlessly, because what else is there to do?
Now, imagine that humanity, by dint of their collective effort across millennia, finally locate a door called “LLM”. This door shows us a way out of cave. We don’t yet know where it leads, but hey, a door! That takes us out of the cave!! Woohoo!!!
The folks who’ve helped humanity locate this door are naturally very excited. This is exactly what we’ve been searching for all this while, they excitedly tell us - a way out of the cave. Let’s go!
But just then4 a game show host appears. “Hello everybody, my name is Monty!”, he says. Before anybody has a chance to react, he continues with his monologue.
“Congratulations on having chosen Door LLM! But before all of you decide to go through that door, I ask you to consider my offer. What if I told you that there probably are a million other doors in this cave? I can’t guarantee that they exist, but it is almost certain that they do. I can’t guarantee that there are a million, but there are probably even more than a million. And I tell you this: if they do exist, there are probably more than a million, and one of them almost certainly takes you out of this cave, and faster than the LLM door will.”
Humanity looks around, unsure of how to react to this offer. Before it can come up with a response, Monty Hall continues his torrent of words.
“These doors (if they exist) can be found by animals and humans, both. That is to say, dear humans, you can learn how animals search this cave, and use those methods, You can use your own methods, or some combination thereof. You might find these doors today afternoon, next week, next year, or who know, maybe never! But if you do find these other doors, one of them will be the fastest ticket to Life, The Universe and Everything.”
While we’re processing this, Monty moves on. There’s no stopping the guy, apparently.
“But there is a problem. If all of you decide to go through the LLM door, it is going to be difficult to find these other doors. That’s because you need to appreciate how long it took to find this one door, and how difficult it is going to be to get all of you through it. So if you double down on this door, good luck, but you almost definitely will not be able to continue to search for the other doors.”
He stops, finally, and seems disinclined to continue. He waits there, quietly but expectantly.
Dwarkesh and his team say the answer is clear, we should go through the LLM door (duh!).
Richard Sutton and his team say the answer is clear, we should stay and explore the cave for these other doors (this team also says duh!).
You get the casting vote.
(Please note that this post relies heavily on analogies, and I’m making a slightly different point than what is usually treated as being the central insight of the Monty Hall problem - see footnote 5. Also, this is not what either Sutton or Dwarkesh are saying! This is my attempt at understanding why Sutton is saying what he is.)
Try this prompt when you ask your LLM of choice about Temporal Difference: Explain what temporal difference is as simply as possible, and explain to me what this has to do with AI getting better at backgammon and chess. Give me the full story, and tell it to me like a raconteur would - but grounded in actual facts, please.
And AI, by definition, is worth thinking about.
It’s my thought experiment, so allow me my flights of fancy
The TMKK of the Monty Hall problem is that it is always advisable to switch and especially so in the case of very large sample spaces. And so I’m inclined to agree with Sutton. Whether we should be opening any of these doors at all, including the LLM one, is a whole other question. But the LLM door is unlikely to be the best one, that is what the Monty Hall problem tells you.