Anthropic will nuke your attempt to use AI to build a nuke
Date:
Fri, 22 Aug 2025 23:00:00 +0000
Description:
Anthropic and the federal government will be checking to make sure you're not trying to build a nuclear bomb with Claude's help.
FULL STORY
If youre the type of person who asks Claude how to make a sandwich, youre
fine. If youre the type of person who asks the AI chatbot how to build a nuclear bomb, you'll not only fail to get any blueprints, you might also face some pointed questions of your own. That's thanks to Anthropic's newly
deployed detector of problematic nuclear prompts.
Like other systems for spotting queries Claude shouldn't respond to, the new classifier scans user conversations, in this case flagging any that veer into how to build a nuclear weapon territory. Anthropic built the classification feature in a partnership with the U.S. Department of Energys National Nuclear Security Administration (NNSA), giving it all the information it needs to determine whether someone is just asking about how such bombs work or if they're looking for blueprints. It's performed with 96% accuracy in tests.
Though it might seem over-the-top, Anthropic sees the issue as more than
merely hypothetical. The chance that powerful AI models may have access to sensitive technical documents and could pass along a guide to building something like a nuclear bomb worries federal security agencies. Even if
Claude and other AI chatbots block the most obvious attempts,
innocent-seeming questions could in fact be veiled attempts at crowdsourcing weapons design. The new AI chatbot generations might help even if it's not
what their developers intend.
The classifier works by drawing a distinction between benign nuclear content, asking about nuclear propulsion, for instance, and the kind of content that could be turned to malicious use. Human moderators might struggle to keep up with any gray areas at the scale AI chatbots operate, but with proper
training, Anthropic and the NNSA believe the AI could police itself.
Anthropic claims its classifier is already catching real-world misuse
attempts in conversations with Claude.
Nuclear AI safety
Nuclear weapons in particular represent a uniquely tricky problem, according
to Anthropic and its partners at the DoE. The same foundational knowledge
that powers legitimate reactor science can, if slightly twisted, provide the blueprint for annihilation. The arrangement between Anthropic and the NNSA could catch deliberate and accidental disclosures, and set up a standard to prevent AI from being used to help make other weapons, too. Anthropic plans
to share its approach with the Frontier Model Forum AI safety consortium.
The narrowly tailored filter is aimed at making sure users can still learn about nuclear science and related topics. You still get to ask about how nuclear medicine works, or whether thorium is a safer fuel than uranium.
What the classifier attempts to circumvent are attempts to turn your home
into a bomb lab with a few clever prompts. Normally, it would be questionable if an AI company could thread that needle, but the expertise of the NNSA
should make the classifier different from a generic content moderation
system. It understands the difference between explain fission and give me a step-by-step plan for uranium enrichment using garage supplies.
This doesnt mean Claude was previously helping users design bombs. But it
could help forestall any attempt to do so. Stick to asking about the way radiation can cure diseases or ask for creative sandwich ideas, not bomb blueprints.
======================================================================
Link to news story:
https://www.techradar.com/ai-platforms-assistants/claude/anthropic-will-nuke-y our-attempt-to-use-ai-to-build-a-nuke
$$
--- SBBSecho 3.28-Linux
* Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)