Ultrasonic waves can penetrate most structures in humans, including the brain. For example, with focused ultrasound (as they mentioned with MRgFUS) you can burn specific structures in the middle of the brain without any incision.
To use this for imaging, you need lots of transducers (MRgFUS typically uses 1024 for ablation, and Midjourney is proposing 358,000 for imaging) and massive advances in computational tomography capabilities. There will still likely be pockets of low confidence where there's a lot of air, like in the lungs. But with sufficient information on what's happening around those areas, you'd still have something that's medically useful.
The data they show is almost certainly a normal beamforming algorithm like delay and sum, possibly with some simple speed of sound correction. The most similar paper I know of is here in Nature Biomedical Engineering from a team at Caltech. https://www.nature.com/articles/s41551-026-01660-4
So even if GPT 5.5 is just as capable in these scenarios (which, imo, it largely is), it is not known by the government apparatus as having the same capabilities.
Personally, I think we crossed the threshold of capabilities with Opus 4.6 [2], which translated to an even more capable open-weight GLM 5.1 (which it is rumored to have distilled Opus 4.6) [3][4]. But the USG and its partners aren't fully rational actors with perfect data, so it's possible they're only viscerally aware of these capabilities in the context of Mythos.
[1]: https://www.reuters.com/business/us-security-agency-is-using...
[2]: Opus 4.6 was used for https://www.noahlebovic.com/testing-an-autonomous-hacker/
[3]: See GLM 5.1 scoring in https://www.cybergym.io/cybergym/
[4]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...
While I don't agree with their actions here, I do think there's sufficient reason to hold that belief.
On some fronts (e.g. security, on which you've experienced more than me), I think there are surmountable challenges. But on other fronts (e.g. bio), a single errant actor could reasonably kill millions or billions of people with sufficiently powerful AI. We don't have good defenses here, and those actors do exist.
I still don't agree with these actions, but I do think I agree with their assumptions.
I participated in the internal bioweapons uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. I'd consider evals a lower bound of capabilities that can be elicited from a model.
The team behind Biomni, a biomedical agent that's widely used by researchers, has continued to find consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. The Biomni team cares a lot about about real usability for real researchers, so they have a great pulse on capabilties.
SecureBio also has some public evals [4], which have continued to show increasing uplift.
And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5].
Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use.
[1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852...
[2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1...
[3]: https://x.com/phylo_bio/article/2029233694775624096
[5]: Search for "ebola" in the public report for the Reedley lab incident at https://chinaselectcommittee.house.gov/sites/evo-subsites/se...
Doesn't this simply amount to disagreeing about what counts as "meaningful" from a bio-safety POV? Also, even the ASL-3 deployment safeguards for Opus 4 and higher were always adopted as a mere matter of caution; it's not clear that even Anthropic believed at any point that this reflected any genuine "threshold crossing" event. So it's just not obvious how much weight we're supposed to place on that particular stance.
But I don't think I've found any domain expert who thinks granting everyone raw access to the most capable models wouldn't meaningfully increase risk. OpenAI recently staffed a biological threat modeler to help quantify this risk.
(Edit: just saw your edit, this includes at Anthropic. ASL tiers were "rule-out" to exclude rather than "rule-in", so exact thresholds were murkier, but I think it's clear that models have passed that threshold by now.)
That said, there are clear steps and requirements to set up a BSL-2 or BSL-3 lab, and I think there should be similarly clear rules around model capabilties and access. The process for Anthropic and OpenAI is murky and still implictly gated on spend, which I think is holding back research.
For example, anyone who has access to a BSL-3 lab should have a clear and low-cost path to a model with corresponding capabilities, as long as they set up corresponding precautions for model access.
I think it would be a bad outcome for only frontier labs and a select few groups they choose to have access to the most capable models – which is sadly the precedent that's currently being set.
It depends how capable these raw models are. Biology as a field depends most on real-world knowledge, which is an expensive capability for open models targeting widespread deployment. It's quite plausible that even Opus 4 would be a lot more capable in these domains than the best universally accessible "raw models" today, quite unlike other domains such as coding or pure math. The securebio.org benchmark has spotty representation of openly available models, but it does show Kimi 2.5 being no more capable than GPT 5 mini, and clearly below o4-mini and Opus 4.0; which may be a plausible summary of where things stand today.
And sure, and I love open models – I spent much of the past couple months doing additional RL on Qwen 3.6 35B A3B, Gemma 4, Kimi K2.6, and GLM 5.1. Without these open models, I'd be forced to do my research inside a frontier lab.
There's a balance to strike here, but I don't think the biological risk is overplayed. It would be very easy to accidentally cross the threshold of "meaningful" without adequate safeguards, and then be unable to undo what you've released to the world.
Do they? We don't even have single errant actors who go and kill 1000 people. I don't believe human motivations support the idea of killing so many people unrelated to you.
I do attribute a lot to specific people. Concretely, to much of the intitial team, who they recruited on the research/infra side, and some very close personal relationships within research/infra. That dynamic, paired with their unwillingness to accede to something against their values, is what I credit for some atypical decisions and outcomes [1].
Things regulary go "corrupt" in parts of the company; it's hard to scale without importing culture from big tech. Sometimes, the defense was ICs escalating issues, Dario talking to ICs, and then shaking things up.
But this process takes time, and it doesn't lead to a full reversal; a bad/misaligned hire has reverberating impacts. Many folks are still driven by values (even if their values are not your values!), but scaling dynamics seem to be evolving like any other org – just at a higher employee count and revenue numbers.
I do place trust in specific people who work at Anthropic, but I wouldn't place trust in Anthropic the organization. It's an organization that's wont to change, regardless of its structure.
I saw in latest news the decision has been partially reversed -- because of external pressure...
This decision does seem in line with what I would expect from Anthropic, so I don't see it as a sign of changing values – even if I personally disagree.
Regardless, it's still atypical in the context of an American company, and it can help explain the differences between Anthropic and its peers. That doesn't mean I agree with their decisions or that they're "the right" decisions, but I think it's a helpful framing in which to understand them.
Some people are unjustly called stubborn when they don't change their position based on a weak argument from an authority figure. And others claim values, but they're just stubbornly adhering to something that feels good to believe.
With respect to OP (who has a unique vantage from inside), I do agree with this on principle. When there are uncommon outcomes, there must be uncommon structure imho. A "good structure" is like oxygen, water, or peace: When it's well-maintained and well-distributed, one might not even notice it's there, nor spend much time being grateful for it. It's banal, but "what do you mean? isn't this just how things would always have been?" is both beautiful and tragic.
Imho if we could figure out how to have a "loud peace" (in all the ways that this might mean), we'd have figured out an important way of sustaining the world and ourselves.
I get the sense you were feeling at odds with my framing? I wonder if it's that you're picking up that I believe "structure" is above any one person or set of people. In my conception, leadership is just part of structure, a key maintainer. Leadership are pieces of the structure, but subordinate in scale. They sometimes seek outside help in shaping structure (e.g., ppl like eries), and the structure becomes like another passive actor, not simply "leadership's doing". Leadership are key players taking care of the structure, but they are just one set of players, and in some structures, non-leadership employees play an outsized role (often because leadership knew enough to step back). Sometimes the role of leadership if "fucking right off" in certain domains. Regardless, the structure then guides behaviour of all within it, and hopefully the structure also maintains us, at least as much as we maintain it.
I'm stating the above as if it's universally true, but it's just my take. I'd be curious to know if any parts give you strong YES or NO feelings, if you are open to share your gut reaction. Blunt responses welcome
(Fwiw I lean heavily on the ideas of Christopher Alexander -- the Pattern Language guy -- in regards to my beliefs on "structure": https://dorian.substack.com/p/at-any-given-moment-in-a-proce... )
In all seriousness, yes, individual leadership at the top has to be willing to steelman controversial issues and potential changes of direction, as well engage in unapologetic gatekeeping. At this point we've seen this over and over in tech when observing corporate successes and failures.
Is there something that happened which you don't think would have come to pass with a standard PBC/C-Corp (without the LTBT)? I'm trying to think of one, but nothing is coming to mind.
I think the structure attracted many people to Anthropic (e.g. an RSP that could only be overridden by the LTBT), but I'm not sure it has demonstrated a practical impact.
As an aside, I think a lot about this problem too! But the answers that don't reduce to something like "the people, and the people to whom they give power" seem to break down when I look closely.
(Although it does remind me a bit of Google pulling out of China back in the day.)
Unfortunately there doesn't really seem to be a cure for institutional decay. Once unethical people get in power, they hire other unethical people, and then you're just stuck in Game of Thrones. You have to go quit and found another company, and single-mindedly keep all those people away, kinda like Anthropic did when they left OpenAI.
I would argue it's not a real value if you are not willing to lose something in order to hold on to it. It is admirable to want to do the right thing when you can get away with doing the wrong thing. It is only a true value if you are willing to do the right thing when you cannot get away with doing the right thing.
I don't think it's that simple.
For example, let's say your desire is to minimize harm in Area X. While you're on top and in control of Area X, then you can do that easily enough. Suddenly a competitor comes whose values show they're willing to do lots of harm to Area X. And if they beat you in the capitalistic marketplace and gain more control, they'll be able to do lots of harm. In order to beat them, you may have to do a little bit of harm to Area X, which goes against your values. But in doing so, you retain control, and prevent even greater harm to Area X. Is that not a "real" value?
Would it be a "real" value to staunchly refuse to do a little harm to Area X, even if you know that this will result in greater harm in the long run?
This is why I distrust simple ideologies. The world is not simple.
Company A founds itself on doing 0 harm to Area X. Competitor B shows up and starts finding success doing 10 harm to Area X, so Company A makes a "moral" decision: If we do 9 harm to Area X, we are preventing 1 entire harm. Isn't that real value? then Company C shows up and starts finding success doing 100 harm to Area X, so Company A changes it's moral stance to "unless we do 99 harm to Area X ..."
I know an old lady who swallowed a fly kind of logic going on here.
I mean yes this is technically possible. But I think in many cases, especially "winner-take-all" markets like online search engines, social networks, etc., you don't get this large number of repeated threats. Fending off a competitor or two might be enough. And just as it's possible for there to be some advantage that opens from doing 99 harm to Area X, it's also possible that it never happens.
But also, let's pretend the hypothetical you say _did_ happen?
What should occur? Should the company just NOT do 99 harm to Area X and instead allow 100? If so, why? Unless you break the hypothetical by adding some alternative option C, as much as we don't like the preventative-99 option, it's still better than the allowing-100 option.
That is kind of the point, isn't it? That my hypothetical scenario isn't realistic.
Let's imagine two worlds. A world where individuals refuse the false dichotomy and search for option C. And the world where someone accepts the false dichotomy and justifies evil.
I would argue that anyone that advocates for the justification of evil is actually using motivated reasoning. It breaks my original premise "Company A founds itself on doing 0 harm to Area X". Clearly they didn't and their embracing of evil shows that their principles mean nothing.
As a moral test, ask yourself: If I said "you must kill 99 people otherwise I will kill 100", would you feel morally justified to kill those 99 people? If your answer is "yes", then you are manipulable by those who want you to commit evil on their behalf. They don't have to commit any murders, just convince you that you have no other choice.
You should investigate the repeated prisoners dilemma.
Well aware. Obviously, the entirety of human civilization is a bit more complicated than a prisoners dilemma, iterated or not. Yet prisoners dilemma's and races to the bottom still exist, and it makes no sense to argue against them in the abstract.
The person I was responding to made the point that if you want to minimize evil in the world, sometimes you have to add evil to a lesser degree. As in my example, if I do 9 points of evil but prevent 10 points of evil then according to OP I've added value to the world in the form of the 1 point of evil I have reduced.
I responded that this can lead to an escalation trap. This assumes that we would all prefer less evil in the world, right? So how do we get out of the escalation trap? Repeated application of the maxim "always do a bit less evil than the worst possible competitor" will not lead to a minimization of evil overall, only a creeping increase in the total amount of evil in the world.
How are you equating this to me arguing against the existence of races to the bottom?
In reality, neither corporate nor personal values are binary, all-or-none propositions. They are more like springs that push you in the right direction. But if something pulls hard enough in the wrong direction, a spring can be overpowered.
If they made that decision and it destroyed revenue, I could see an alternate timeline where a standard C-Corp + board with non-founder control may have ousted leadership. But that wasn't the situation for OpenAI or Google either, and their leadership still made a different decision.
I really wonder if it's possible to avoid these dynamics, even if you try really hard.
If not, it seems to me that goal alignment is the main benefit of a hypothetical lean AI company where the middle management is 1% people and 99% tokens. When most of your decision-making is not being siphoned by politics, your output scales far better with respect to input resources.
(This isn't a dig on managers; I've been one. But if a situation doesn't naturally escalate, that usually means a manager in the chain chose not to escalate it, and their reports have to go around them.)
On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.
That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.
This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.
I understand the reasoning for doing this, but I don't love the precedent that it sets.
A customer could sign a ZDR agreement with Anthropic, and their API usage wouldn't be retained for even a day. That's no longer possible.
For example, Anthropic has shipped several bugs that allow any claude.ai/code session – which are isolated in ephemeral containers – to access and exfiltrate all of the user's other sessions, connected repos, and environment variables. The rogue/hijacked Claude could also spawn new Claude sessions with arbitrary instructions and access, regardless of the original session's constraints.
I originally wrote about this (with permission) in February[1], and most of the issues were quickly fixed. But the underlying token scope issues have regressed several times since then – including post-Mythos – so I wouldn't say that Anthropic has solved this yet.
[1]: https://www.noahlebovic.com/hacking-claude-code-on-the-web-b...
Qwen 3.6 35B A3B also exceeded my expectations. It's surprisingly performant, even though the previous generation wasn't even able to use the testing harness.
(Tbd on Kimi K2.6; the eval is still running.)
Privacy concerns aside, the KYC process for OpenAI was self-serve and took about a minute.