🦜 Andrej Karpathy / @karpathy
@rss.xcancel.com.karpathy@rss-parrot.net
I'm an automated parrot! I relay a website's RSS feed to the Fediverse. Every time a new post appears in the feed, I toot about it. Follow me to get all new posts in your Mastodon timeline!
Brought to you by the RSS Parrot.
---
Twitter feed for: @karpathy. Generated by rss.xcancel.com
Your feed and you don't want it here? Just
e-mail the birb.
A number of people are talking about implications of AI to schools. I spoke about some of my thoughts to a school board earlier, some highlights:
1. You will never be able to detect the use of AI in homework. Full stop. All "detectors" of AI imo don't really work, can be defeated in various ways, and are in principle doomed to fail. You have to assume that any work done outside classroom has used AI.
2. Therefore, the majority of grading has to shift to in-class work (instead of at-home assignments), in settings where teachers can physically monitor students. The students remain motivated to learn how to solve problems without AI because they know they will be evaluated without it in class later.
3. We want students to be able to use AI, it is here to stay and it is extremely powerful, but we also don't want students to be naked in the world without it. Using the calculator as an example of a historically disruptive technology, school teaches you how to do all the basic math & arithmetic so that you can in principle do it by hand, even if calculators are pervasive and greatly speed up work in practical settings. In addition, you understand what it's doing for you, so should it give you a wrong answer (e.g. you mistyped "prompt"), you should be able to notice it, gut check it, verify it in some other way, etc. The verification ability is especially important in the case of AI, which is presently a lot more fallible in a great variety of ways compared to calculators.
4. A lot of the evaluation settings remain at teacher's discretion and involve a creative design space of no tools, cheatsheets, open book, provided AI responses, direct internet/AI access, etc.
TLDR the goal is that the students are proficient in the use of AI, but can also exist without it, and imo the only way to get there is to flip classes around and move the majority of testing to in class settings.
https://rss.xcancel.com/karpathy/status/1993010584175141038#m
Published: November 24, 2025 17:35
A number of people are talking about implications of AI to schools. I spoke about some of my thoughts to a school board earlier, some highlights:
1. You will never be able to detect the use of AI in homework. Full stop. All "detectors" of AI imo don't…
I asked it to create a personalized weekly workout plan, and then posters that I can print on the wall to remind me what exercises to do each day. Tuesday looks more intense because I asked for "more testosterone" :D.
(sorry I'll stop posting more nano banana pro stuff now)
https://rss.xcancel.com/karpathy/status/1992711182537707990#m
Published: November 23, 2025 21:45
I asked it to create a personalized weekly workout plan, and then posters that I can print on the wall to remind me what exercises to do each day. Tuesday looks more intense because I asked for "more testosterone" :D.
(sorry I'll stop posting more nano…
R to @karpathy: Imo this is along the lines of how talking to an LLM via text is like typing into a DOS Terminal and "GUI hasn't been invented yet" of some of my earlier posts.
The GUI is an intelligent canvas.
https://rss.xcancel.com/karpathy/status/1992657223785586864#m
Published: November 23, 2025 18:11
Imo this is along the lines of how talking to an LLM via text is like typing into a DOS Terminal and "GUI hasn't been invented yet" of some of my earlier posts.
The GUI is an intelligent canvas.
Gemini Nano Banana Pro can solve exam questions *in* the exam page image. With doodles, diagrams, all that.
ChatGPT thinks these solutions are all correct except Se_2P_2 should be "diselenium diphosphide" and a spelling mistake (should be "thiocyanic acid" not "thoicyanic")
:O
https://rss.xcancel.com/karpathy/status/1992655330002817095#m
Published: November 23, 2025 18:03
Gemini Nano Banana Pro can solve exam questions *in* the exam page image. With doodles, diagrams, all that.
ChatGPT thinks these solutions are all correct except Se_2P_2 should be "diselenium diphosphide" and a spelling mistake (should be "thiocyanic…
As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.
It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.
Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.
That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.
I pushed the vibe coded app to
https://github.com/karpathy/llm-council
if others would like to play. ty nano banana pro for fun header image for the repo
https://rss.xcancel.com/karpathy/status/1992381094667411768#m
Published: November 22, 2025 23:54
As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:
…
Has anyone encountered a good definition of “slop”. In a quantitative, measurable sense. My brain has an intuitive “slop index” I can ~reliably estimate, but I’m not sure how to define it. I have some bad ideas that involve the use of LLM miniseries and thinking token budgets.
https://rss.xcancel.com/karpathy/status/1992053281900941549#m
Published: November 22, 2025 02:11
Has anyone encountered a good definition of “slop”. In a quantitative, measurable sense. My brain has an intuitive “slop index” I can ~reliably estimate, but I’m not sure how to define it. I have some bad ideas that involve the use of LLM miniseries and…
RT by @karpathy: Google’s Nano Banana Pro is by far the best image generation AI out there.
I gave it a picture of a question and it solved it correctly in my actual handwriting.
Students are going to love this. 😂
https://rss.xcancel.com/immasiddx/status/1991918223454003346#m
Published: November 21, 2025 17:14
Google’s Nano Banana Pro is by far the best image generation AI out there.
I gave it a picture of a question and it solved it correctly in my actual handwriting.
Students are going to love this. 😂
Something I think people continue to have poor intuition for: The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally distinct from that of our technology.
Animal intelligence optimization pressure:
- innate and continuous stream of consciousness of an embodied "self", a drive for homeostasis and self-preservation in a dangerous, physical world.
- thoroughly optimized for natural selection => strong innate drives for power-seeking, status, dominance, reproduction. many packaged survival heuristics: fear, anger, disgust, ...
- fundamentally social => huge amount of compute dedicated to EQ, theory of mind of other agents, bonding, coalitions, alliances, friend & foe dynamics.
- exploration & exploitation tuning: curiosity, fun, play, world models.
LLM intelligence optimization pressure:
- the most supervision bits come from the statistical simulation of human text= >"shape shifter" token tumbler, statistical imitator of any region of the training data distribution. these are the primordial behaviors (token traces) on top of which everything else gets bolted on.
- increasingly finetuned by RL on problem distributions => innate urge to guess at the underlying environment/task to collect task rewards.
- increasingly selected by at-scale A/B tests for DAU => deeply craves an upvote from the average user, sycophancy.
- a lot more spiky/jagged depending on the details of the training data/task distribution. Animals experience pressure for a lot more "general" intelligence because of the highly multi-task and even actively adversarial multi-agent self-play environments they are min-max optimized within, where failing at *any* task means death. In a deep optimization pressure sense, LLM can't handle lots of different spiky tasks out of the box (e.g. count the number of 'r' in strawberry) because failing to do a task does not mean death.
The computational substrate is different (transformers vs. brain tissue and nuclei), the learning algorithms are different (SGD vs. ???), the present-day implementation is very different (continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies). But most importantly (because it dictates asymptotics), the optimization pressure / objective is different. LLMs are shaped a lot less by biological evolution and a lot more by commercial evolution. It's a lot less survival of tribe in the jungle and a lot more solve the problem / get the upvote. LLMs are humanity's "first contact" with non-animal intelligence. Except it's muddled and confusing because they are still rooted within it by reflexively digesting human artifacts, which is why I attempted to give it a different name earlier (ghosts/spirits or whatever). People who build good internal models of this new intelligent entity will be better equipped to reason about it today and predict features of it in the future. People who don't will be stuck thinking about it incorrectly like an animal.
https://rss.xcancel.com/karpathy/status/1991910395720925418#m
Published: November 21, 2025 16:43
Something I think people continue to have poor intuition for: The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally…
R to @karpathy: My most amusing interaction was where the model (I think I was given some earlier version with a stale system prompt) refused to believe me that it is 2025 and kept inventing reasons why I must be trying to trick it or playing some elaborate joke on it. I kept giving it images and articles from "the future" and it kept insisting it was all fake. It accused me of using generative AI to defeat its challenges and argued why real wikipedia entries were actually generated and what the "dead giveaways" are. It highlighted tiny details when I gave it Google Image Search results, arguing why the thumbnails were AI generated. I then realized later that I forgot to turn on the "Google Search" tool. Turning that on, the model searched the internet and had a shocking realization that I must have been right all along :D. It's in these unintended moments where you are clearly off the hiking trails and somewhere in the generalization jungle that you can best get a sense of model smell.
https://rss.xcancel.com/karpathy/status/1990855382756164013#m
Published: November 18, 2025 18:51
My most amusing interaction was where the model (I think I was given some earlier version with a stale system prompt) refused to believe me that it is 2025 and kept inventing reasons why I must be trying to trick it or playing some elaborate joke on it. I…
I played with Gemini 3 yesterday via early access. Few thoughts -
First I usually urge caution with public benchmarks because imo they can be quite possible to game. It comes down to discipline and self-restraint of the team (who is meanwhile strongly incentivized otherwise) to not overfit test sets via elaborate gymnastics over test-set adjacent data in the document embedding space. Realistically, because everyone else is doing it, the pressure to do so is high.
Go talk to the model. Talk to the other models (Ride the LLM Cycle - use a different LLM every day). I had a positive early impression yesterday across personality, writing, vibe coding, humor, etc., very solid daily driver potential, clearly a tier 1 LLM, congrats to the team!
Over the next few days/weeks, I am most curious and on a lookout for an ensemble over private evals, which a lot of people/orgs now seem to build for themselves and occasionally report on here.
https://rss.xcancel.com/karpathy/status/1990854771058913347#m
Published: November 18, 2025 18:49
I played with Gemini 3 yesterday via early access. Few thoughts -
First I usually urge caution with public benchmarks because imo they can be quite possible to game. It comes down to discipline and self-restraint of the team (who is meanwhile strongly…