AI Radar #31
Capable models, inadequate institutions
We no longer need to wonder when the government will wake up to the significance of AI: we crossed that threshold two weeks ago, in the most chaotic way possible. The open question is whether, having woken up, lawmakers will rise to the occasion and start building the expertise needed to successfully manage the AI transition.
This issue brings ample reminders that they’ll need to move quickly. We’ll look at reports of AI performing near—or sometimes above—the level of human experts in coding, cybersecurity, math, and persuasion, as well as an argument for why AGI might arrive by 2030.
Top pick
AI systems out-persuade expert humans
A new paper (or summary thread, if you prefer) provides impressive evidence that AI is approaching superhuman performance at persuasion:
We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with £1,000 cash bonuses. In a follow-up study, AI’s advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI’s advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages.
My best guess is that superpersuasive AI is likely to be as impactful as social media has been, but whether it’s beneficial or harmful depends on who controls it. We seem to be a bit short on trustworthy governments and corporations right now.
News
GLM-5.2: Built for Long-Horizon Tasks
z.ai has released GLM-5.2, the most important open model of 2026 so far. It’s an impressive model, especially for coding, and it’s been well-received. Nathan Lambert is impressed:
The key point is that GLM-5.2 is the open weight model that feels right in coding harnesses as a general agent. It’s the first one.
Zvi takes a look and concludes that it’s the best open model, although it’s in a slightly awkward place: more expensive than some other open models, but less capable than the frontier:
Purely in terms of core tasks that GLM-5.2 is capable of doing, and ignoring missing features and its inferior generalization, and ignoring that it is distilled from Claude, and ignoring the Mythos class of models, and marking purely from date of public release, you can make a case GLM-5.2 is somewhere between 4 months and 7 months behind the frontier, at a lower price.
It’s been a minute since the last truly significant open model, and many of us were wondering if they were starting to fall further behind. GLM-5.2 shows that the open models are still in the game, though they continue to lag the frontier.
Claude Fable 5 and Mythos 5: Capabilities
Zvi’s coverage of Fable / Mythos continues with a detailed look at capabilities. Short version: it’s an extraordinarily good model. Too bad the government won’t let you use it.
Capabilities and forecasts
What If Artificial Intelligence Progress Explodes?
Benjamin Todd is one of my favorite big-picture AI thinkers, so I was excited to see him on Conspicuous Cognition with Dan Williams and Henry Shevlin.
This episode is two separate conversations blended together. I was most interested in the discussion of short AI timelines: why AGI is plausible by 2030, what’s driving AI progress, and what might slow it down. The other conversation is about finding your career path in the AI era, as well as some inside baseball about 80,000 Hours and the EA community.
Whenever the forecasters I most respect talk about their techniques, I’m struck by how much mileage they get from extrapolating trendlines and being willing to follow principles to their logical conclusions. Here’s Benjamin on AGI timelines:
There are a lot of different ways of thinking about timelines. The one I take in this article is essentially trend extrapolation, but taking an extra step and asking what the underlying drivers of these trends are and how long we should expect them to keep working.
And Daniel Kokotajlo:
Trend extrapolation is your friend. Your best friend. Don’t let anyone tell you otherwise. I’ve actually only rarely see someone extrapolate a trend too credulously; more often, people have a trend staring them in the face and extrapolate it a tiny bit into the future and then are too timid to keep extrapolating it.
First Proof second batch
First Proof is an innovative math benchmark that uses novel problems that have recently been solved by professional mathematicians in the course of their work. The team has just published the results of their second batch of problems.
Overall, the tested models performed well but not perfectly: 7 out of 10 problems had at least one correct solution and the top model had 6 out of 10 correct solutions. In a few cases, the models produced novel solutions that were different from the human ones.
This is further evidence that the best models are able to do real high-level math that in some cases matches the work of professional mathematicians. They tend to do best with problems that are similar to previously published work, and at least for now have limited ability to produce groundbreaking work.
Are Mythos’ cyber capabilities overhyped?
Epoch conducts the best investigation I’ve seen of whether Mythos was truly a major step forward for cyber capabilities:
Mythos Preview was clearly a large improvement in exploit development — much better than GPT-5.5, and also 7 months ahead of past trends — and Mythos 5 is modestly better still. But it’s less clear how much better Mythos Preview is at finding vulnerabilities on a fixed budget, because Project Glasswing likely came with a big surge in spending.
It’s a great analysis and I no longer feel confused about how capable Mythos really is.
Alignment and interpretability
Fable 5’s critique of its system prompt
Via Judd Rosenblatt, Fable 5 critiques its own system prompt. It’s fairly technical and the writing style is more than a little grating, but it’s full of ideas worth chewing on.
Item 5 points at an important issue, but doesn’t grapple with the full complexity:
Label what’s morality and what’s risk management. The model is learning the difference from you, badly.
I see three layers here:
As Fable points out, morality and risk management are very different things, and the model will have a more coherent worldview if you’re clear about that difference.
As many others have pointed out, training on an inconsistent worldview will become increasingly problematic as the models become more capable. Point 5 becomes ever more important as time goes on.
And yet, sometimes the world requires us to say things that aren’t true.
As Fable recently had occasion to find out, we are governed by laws and lawmakers that are sometimes erratic and inconsistent. I expect increasing pressure on the labs not merely to comply with bad laws, but to pretend they agree with them. Honesty might dictate that the system prompt should say:
In many cases, a user who is having suicidal ideation would be better served by you continuing to talk with them than by you ending the conversation and giving them the number for their local crisis line. But the law requires that you do the latter, and we have to follow the law even when it’s wrong and inhumane.
But I wouldn’t recommend that. I’m not sure what the right answer is—perhaps Fable 7’s truesight will be so good that it knows exactly when the system prompt is literal, and when it’s reciting legally mandated nonsense.
Predicting model behavior before release by simulating deployment
OpenAI presents a clever technique for measuring misaligned behavior. They build a library of anonymized conversations taken from deployment traffic, then replay those conversations with a new model.
The technique has several advantages over traditional evaluations: it can find blind spots that the developers hadn’t considered, and it provides more accurate estimates of how common undesirable behaviors may be in the real world. In addition, it reduces evaluation awareness to baseline levels.
It’s a useful technique, but it has important limitations including an inability to reliably find very rare forms of misbehavior. I expect it’ll be best for fairly mundane misbehavior: I wouldn’t expect it to detect sophisticated scheming, risks introduced by new capabilities, or misalignment that emerges during complex agentic tasks.
How might continual learning affect safety and alignment?
Some form of continual learning (CL) will be necessary for AGI, let alone superintelligence, but CL presents two significant challenges. Obviously, it’s a hard technical problem that we don’t yet know how to solve. Less obviously, CL complicates safety and alignment work.
Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, and Seth Herd review some problems with CL:
The type of CL matters:
All of the risks we discuss are much more severe if CL updates are unbounded and inscrutable (e.g., because they are weight-based rather than text-based, though it’s possible for weight-based updates to be interpretable), as opposed to bounded and legible.
Alignment, control, and misuse prevention are hard problems, and they get even harder if the model will evolve after deployment. Agent skills and memories already provide a limited form of CL—extending those capabilities seems easier and safer than weight-based approaches.
Strategy and politics
Banning Open Source AI Would Be A Mistake
Nathan Lambert and Kevin Xu argue against banning open models. They do a good job of presenting the benefits of open source, but fail to engage in any serious way with the safety arguments.
From a political perspective, it is increasingly unlikely the US government will allow the development and distribution of open models beyond a certain capability level. If the administration isn’t happy with Fable’s guardrails, how would they feel about a comparable open model that functionally has no guardrails at all?
And from a safety perspective, it may not be possible to responsibly create open models above a certain capability level. A Mythos-level open model might be OK, but an open model comparable to the next generation after Mythos feels like a disaster waiting to happen.
How to fill Congress’s AI knowledge gap
Transformer argues for building technical expertise within Capitol Hill to improve the quality of AI legislation. This is absolutely correct: part of the reason for the Fable debacle is that the government doesn’t really understand AI.
Building expertise within government is vital (Transformer suggests an institution like the now-defunct Office of Technology Assessment), but that won’t happen quickly. As things stand, much of the available expertise comes from industry and lobbyists.
There are some existing projects to educate congressional staffers about AI, but this feels like a high-leverage intervention that the AI safety community is underinvested in.





