Learning to Learn
Sumanth had a nice fun tweet out yesterday, just begging to be drilled down into:
For better or for worse, almost all of the reading I do these days is about AI, but of a very non-technical variety. I am far more interested in the philosophical, economic, social and cultural impacts of AI, and almost all of this list is stuff I know nothing about. I was going to say "next to nothing", but even that would be an exaggeration.
And so I called upon my team.
If you did not click upon that link, here's a question for you: if I had access to (take a deep breath) 4o, 40-mini, 01, 03-mini, 03-mini-high, 4.5 (all from OpenAI), Claude 3.7 Sonnet, 3.7 Haiku, 3.5 Sonnet (Oct'24), Opus (all from Anthropic), 2.0 Flash Thinking Experimental, 2.0 Flash and 2.0 Pro Experimental (all from Gemini), and Grok3 and DeepSeek, not to mention DeepResearch (Google and OpenAI), which should I use?
If you didn't understand the question, let alone the answer, please, click on that link.
Because I was about to run out of queries on 4.5, I decided to use 01 instead. I was tempted with 3.7, of course, but I knew that my conversation was going to be a very long one, and so I regretfully decided against it.
This bears repeating: if you haven't understood the last three paragraphs, you don't know enough about where AI is today. That should worry you, and you should do something about this.
Anyways, from among my team members, I called upon 01, and told it to get to work:
I uploaded that picture at the start of this post, and wrote down this prompt:
I would like to work with you to get a sense of each of these. I am not a technical person, nor am I looking to acquire technical expertise in any of these. This is not about looking for a job in this area in AI, nor is it about a college examination. I am simply a curious person with some baseline familiarity with how AI has developed, but that familiarity is almost entirely non-technical. I would like, as I said, a level of familiarity with each of these topics. Enough so that I have a basic grasp of the basic idea, what it implies, how it fits into the larger question of AI development, and what the current constraints are about fully utilizing this idea. I would also like recommendations for what to read/see/listen to next in this topic, should I choose to drill down deeper. For each item on this list, give me a summary in this style: first, a 1 sentence summary. Then, a five bullet point expansion. Finally, a twenty bullet point summary, followed by the five bullet points and the 1 bullet point summary again. This should be followed by those three optional additional resources. Ask, at the end of all this, if I have understood the concept, or would like to learn more. ONLY when I say "Let's move on to the next concept", or something equivalent, should you move on to the next concept. If you've understood, can we please begin with the first one? And if not, please ask whatever doubts you may have.
At the time of writing this, I have not finished going through the list. This is the opposite of light reading, and it will take time. I'm currently at "Attention is all you need".
But feel free to run with this prompt, modify it to suit your purposes, and dive into this list. If you are a reader of this blog, chances are that you too, like me, are not very well versed about the technical side of things - in that case, you may not wish to modify this prompt too much. If, on the other hand, you ARE somewhat more familiar with these topics, please be sure to mention that in the prompt, and Federer will raise his game accordingly.
What made me happiest about my prompt is the fact that applying this prompt to "multi-scale context aggregation by directed convolutions" was a very gratifying exercise. Having read the output that 01 gave me, I went and asked Claude 3.7 a follow-up question:
There's a deeper point over here - spend some time in figuring out your own favorite ways to learn, and you might learn that LLM's learn in very similar ways. That doesn't make me special, it just means that we are trying to teach LLM's the same way we learn, and we'll often land upon happy coincidences like these.
The point I am trying to make is that anybody who is in the field of education has "an in" to learning about how LLM's learn, and we would do well to double down on that natural advantage.
And coming back to "my team", I also had a separate tab open, where I had follow-up conversations with Google Gemini (2.0 Flash) about the conversation I was having with 01. So 01 was my prof, so to speak, and 2.0 Flash was sitting next to me in class. Figure out your prof and your fellow student, and you have your own class going!
By the way, it is possible to have Google Gemini "see" the lecture along with you, and that would make it even better. And I really should be setting this up, but the opportunity cost would be giving up on my procrastination skills, alas.
But I gotta admit, I do love it when I get to use the phrase "left to the exercise as a reader".
Haffun!