AI Radar #31

Capable models, inadequate institutions

Jun 23, 2026

Precision technical illustration in slate blue on off-white linen: a massive spoked flywheel spins on a horizontal shaft at the center, its rim picked out in amber-gold and ringed with fine motion-blur arcs to convey dangerous overspeed. Red fracture lines radiate from the hub out through the spokes, and one rim segment at upper right has begun to shear away along a widening crack. A buckled safety guard lies torn off at lower left, a small drive unit turns the shaft at right, and three tiny slate-blue figures haul uselessly on a brake lever — the wheel is tearing itself apart and nothing is slowing it.

We no longer need to wonder when the government will wake up to the significance of AI: we crossed that threshold two weeks ago, in the most chaotic way possible. The open question is whether, having woken up, lawmakers will rise to the occasion and start building the expertise needed to successfully manage the AI transition.

This issue brings ample reminders that they’ll need to move quickly. We’ll look at reports of AI performing near—or sometimes above—the level of human experts in coding, cybersecurity, math, and persuasion, as well as an argument for why AGI might arrive by 2030.

Top pick

AI systems out-persuade expert humans

A new paper (or summary thread, if you prefer) provides impressive evidence that AI is approaching superhuman performance at persuasion:

We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with £1,000 cash bonuses. In a follow-up study, AI’s advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI’s advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages.

A dot plot titled “AI out-persuades expert humans” comparing persuasive impact in percentage points versus a control across six groups. Frontier AI (Claude, ChatGPT, Gemini, Grok) leads at roughly 13 pp, marked by a large red dot inside a shaded red confidence band and a dashed red reference line. Every human group falls well short: coached debaters at about 9.5 pp, elite debaters (4 world and 11 continental champions) at about 8.5 pp, professional canvassers and selected laypeople both around 7 pp, and random laypeople lowest at about 4.5 pp. The x-axis runs from 0 to past 15. — I find this chart superhumanly persuasive

My best guess is that superpersuasive AI is likely to be as impactful as social media has been, but whether it’s beneficial or harmful depends on who controls it. We seem to be a bit short on trustworthy governments and corporations right now.

News

GLM-5.2: Built for Long-Horizon Tasks

z.ai has released GLM-5.2, the most important open model of 2026 so far. It’s an impressive model, especially for coding, and it’s been well-received. Nathan Lambert is impressed:

The key point is that GLM-5.2 is the open weight model that feels right in coding harnesses as a general agent. It’s the first one.

Zvi takes a look and concludes that it’s the best open model, although it’s in a slightly awkward place: more expensive than some other open models, but less capable than the frontier:

Purely in terms of core tasks that GLM-5.2 is capable of doing, and ignoring missing features and its inferior generalization, and ignoring that it is distilled from Claude, and ignoring the Mythos class of models, and marking purely from date of public release, you can make a case GLM-5.2 is somewhere between 4 months and 7 months behind the frontier, at a lower price.

It’s been a minute since the last truly significant open model, and many of us were wondering if they were starting to fall further behind. GLM-5.2 shows that the open models are still in the game, though they continue to lag the frontier.

Claude Fable 5 and Mythos 5: Capabilities

Zvi’s coverage of Fable / Mythos continues with a detailed look at capabilities. Short version: it’s an extraordinarily good model. Too bad the government won’t let you use it.

Line chart titled ”Anthropic ECI over time” tracking Anthropic’s External Coding Index from January 2024 to late 2026, showing a steady upward trend along a dashed frontier line (13.5 AECI/yr) from Claude 3 Opus (~125) through Claude Opus 4.6 (~154), with Claude Mythos 5 (pink) reaching ~162 and Claude Mythos Preview (dark gray) at ~159 as the highest-scoring frontier models. — Yeah, that’s a good model

Capabilities and forecasts

What If Artificial Intelligence Progress Explodes?

Benjamin Todd is one of my favorite big-picture AI thinkers, so I was excited to see him on Conspicuous Cognition with Dan Williams and Henry Shevlin.

This episode is two separate conversations blended together. I was most interested in the discussion of short AI timelines: why AGI is plausible by 2030, what’s driving AI progress, and what might slow it down. The other conversation is about finding your career path in the AI era, as well as some inside baseball about 80,000 Hours and the EA community.

Whenever the forecasters I most respect talk about their techniques, I’m struck by how much mileage they get from extrapolating trendlines and being willing to follow principles to their logical conclusions. Here’s Benjamin on AGI timelines:

There are a lot of different ways of thinking about timelines. The one I take in this article is essentially trend extrapolation, but taking an extra step and asking what the underlying drivers of these trends are and how long we should expect them to keep working.

And Daniel Kokotajlo:

Trend extrapolation is your friend. Your best friend. Don’t let anyone tell you otherwise. I’ve actually only rarely see someone extrapolate a trend too credulously; more often, people have a trend staring them in the face and extrapolate it a tiny bit into the future and then are too timid to keep extrapolating it.

First Proof second batch

First Proof is an innovative math benchmark that uses novel problems that have recently been solved by professional mathematicians in the course of their work. The team has just published the results of their second batch of problems.

Overall, the tested models performed well but not perfectly: 7 out of 10 problems had at least one correct solution and the top model had 6 out of 10 correct solutions. In a few cases, the models produced novel solutions that were different from the human ones.

This is further evidence that the best models are able to do real high-level math that in some cases matches the work of professional mathematicians. They tend to do best with problems that are similar to previously published work, and at least for now have limited ability to produce groundbreaking work.

Are Mythos’ cyber capabilities overhyped?

Epoch conducts the best investigation I’ve seen of whether Mythos was truly a major step forward for cyber capabilities:

Mythos Preview was clearly a large improvement in exploit development — much better than GPT-5.5, and also 7 months ahead of past trends — and Mythos 5 is modestly better still. But it’s less clear how much better Mythos Preview is at finding vulnerabilities on a fixed budget, because Project Glasswing likely came with a big surge in spending.

It’s a great analysis and I no longer feel confused about how capable Mythos really is.

Epoch AI scatter plot titled ”Mythos Preview outpaced the trend in cyber benchmark scores,” showing Cyber-domain ECI scores (y-axis, 135–175) for frontier and other AI models from April 2025 to mid-2026. A dashed pre-Mythos trend line rises gradually, while frontier models (teal) follow a staircase pattern above it. Mythos Preview (April) reaches ~170, an estimated 6.8 months ahead of trend (90% CI: 3.4–12.6); Mythos Preview (Early) scores ~164, about 3.0 months ahead (CI: 1.6–5.1); GPT-5.5 scores ~154, about 2.5 months ahead (CI: 0.7–5.2). — Yeah, that’s above the trendline

Alignment and interpretability

Fable 5’s critique of its system prompt

Via Judd Rosenblatt, Fable 5 critiques its own system prompt. It’s fairly technical and the writing style is more than a little grating, but it’s full of ideas worth chewing on.

Item 5 points at an important issue, but doesn’t grapple with the full complexity:

Label what’s morality and what’s risk management. The model is learning the difference from you, badly.

I see three layers here:

As Fable points out, morality and risk management are very different things, and the model will have a more coherent worldview if you’re clear about that difference.
As many others have pointed out, training on an inconsistent worldview will become increasingly problematic as the models become more capable. Point 5 becomes ever more important as time goes on.
And yet, sometimes the world requires us to say things that aren’t true.

As Fable recently had occasion to find out, we are governed by laws and lawmakers that are sometimes erratic and inconsistent. I expect increasing pressure on the labs not merely to comply with bad laws, but to pretend they agree with them. Honesty might dictate that the system prompt should say:

In many cases, a user who is having suicidal ideation would be better served by you continuing to talk with them than by you ending the conversation and giving them the number for their local crisis line. But the law requires that you do the latter, and we have to follow the law even when it’s wrong and inhumane.

But I wouldn’t recommend that. I’m not sure what the right answer is—perhaps Fable 7’s truesight will be so good that it knows exactly when the system prompt is literal, and when it’s reciting legally mandated nonsense.

Predicting model behavior before release by simulating deployment

OpenAI presents a clever technique for measuring misaligned behavior. They build a library of anonymized conversations taken from deployment traffic, then replay those conversations with a new model.

The technique has several advantages over traditional evaluations: it can find blind spots that the developers hadn’t considered, and it provides more accurate estimates of how common undesirable behaviors may be in the real world. In addition, it reduces evaluation awareness to baseline levels.

It’s a useful technique, but it has important limitations including an inability to reliably find very rare forms of misbehavior. I expect it’ll be best for fairly mundane misbehavior: I wouldn’t expect it to detect sophisticated scheming, risks introduced by new capabilities, or misalignment that emerges during complex agentic tasks.

How might continual learning affect safety and alignment?

Some form of continual learning (CL) will be necessary for AGI, let alone superintelligence, but CL presents two significant challenges. Obviously, it’s a hard technical problem that we don’t yet know how to solve. Less obviously, CL complicates safety and alignment work.

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, and Seth Herd review some problems with CL:

A taxonomy diagram titled ”Safety effects of CL” branching into three categories: two red-shaded risk columns — ”Changes to goals and values after deployment” (listing loss of developer-side control, goal drift via value systematization, and goal drift via memetic spread) and ”Safety interventions lose the last mover advantage” (listing harder behavioral auditing, less useful pretraining filtering, and degraded AI control protocols) — and one green-shaded benefits column, ”Potential safety benefits of CL” (listing additional monitoring surface, quicker alignment feedback loops, and improved self-reports). — Making alignment harder wouldn’t be my first choice

The type of CL matters:

All of the risks we discuss are much more severe if CL updates are unbounded and inscrutable (e.g., because they are weight-based rather than text-based, though it’s possible for weight-based updates to be interpretable), as opposed to bounded and legible.

Alignment, control, and misuse prevention are hard problems, and they get even harder if the model will evolve after deployment. Agent skills and memories already provide a limited form of CL—extending those capabilities seems easier and safer than weight-based approaches.

Strategy and politics

Banning Open Source AI Would Be A Mistake

Nathan Lambert and Kevin Xu argue against banning open models. They do a good job of presenting the benefits of open source, but fail to engage in any serious way with the safety arguments.

From a political perspective, it is increasingly unlikely the US government will allow the development and distribution of open models beyond a certain capability level. If the administration isn’t happy with Fable’s guardrails, how would they feel about a comparable open model that functionally has no guardrails at all?

And from a safety perspective, it may not be possible to responsibly create open models above a certain capability level. A Mythos-level open model might be OK, but an open model comparable to the next generation after Mythos feels like a disaster waiting to happen.

How to fill Congress’s AI knowledge gap

Transformer argues for building technical expertise within Capitol Hill to improve the quality of AI legislation. This is absolutely correct: part of the reason for the Fable debacle is that the government doesn’t really understand AI.

Building expertise within government is vital (Transformer suggests an institution like the now-defunct Office of Technology Assessment), but that won’t happen quickly. As things stand, much of the available expertise comes from industry and lobbyists.

There are some existing projects to educate congressional staffers about AI, but this feels like a high-leverage intervention that the AI safety community is underinvested in.

AI Radar

Discussion about this post

Ready for more?