• Anthropic, US Government

    From Mike Powell@1:2320/105 to All on Sat Aug 23 09:09:59 2025
    Anthropic will nuke your attempt to use AI to build a nuke

    Date:
    Fri, 22 Aug 2025 23:00:00 +0000

    Description:
    Anthropic and the federal government will be checking to make sure you're not trying to build a nuclear bomb with Claude's help.

    FULL STORY

    If youre the type of person who asks Claude how to make a sandwich, youre
    fine. If youre the type of person who asks the AI chatbot how to build a nuclear bomb, you'll not only fail to get any blueprints, you might also face some pointed questions of your own. That's thanks to Anthropic's newly
    deployed detector of problematic nuclear prompts.

    Like other systems for spotting queries Claude shouldn't respond to, the new classifier scans user conversations, in this case flagging any that veer into how to build a nuclear weapon territory. Anthropic built the classification feature in a partnership with the U.S. Department of Energys National Nuclear Security Administration (NNSA), giving it all the information it needs to determine whether someone is just asking about how such bombs work or if they're looking for blueprints. It's performed with 96% accuracy in tests.

    Though it might seem over-the-top, Anthropic sees the issue as more than
    merely hypothetical. The chance that powerful AI models may have access to sensitive technical documents and could pass along a guide to building something like a nuclear bomb worries federal security agencies. Even if
    Claude and other AI chatbots block the most obvious attempts,
    innocent-seeming questions could in fact be veiled attempts at crowdsourcing weapons design. The new AI chatbot generations might help even if it's not
    what their developers intend.

    The classifier works by drawing a distinction between benign nuclear content, asking about nuclear propulsion, for instance, and the kind of content that could be turned to malicious use. Human moderators might struggle to keep up with any gray areas at the scale AI chatbots operate, but with proper
    training, Anthropic and the NNSA believe the AI could police itself.
    Anthropic claims its classifier is already catching real-world misuse
    attempts in conversations with Claude.

    Nuclear AI safety

    Nuclear weapons in particular represent a uniquely tricky problem, according
    to Anthropic and its partners at the DoE. The same foundational knowledge
    that powers legitimate reactor science can, if slightly twisted, provide the blueprint for annihilation. The arrangement between Anthropic and the NNSA could catch deliberate and accidental disclosures, and set up a standard to prevent AI from being used to help make other weapons, too. Anthropic plans
    to share its approach with the Frontier Model Forum AI safety consortium.

    The narrowly tailored filter is aimed at making sure users can still learn about nuclear science and related topics. You still get to ask about how nuclear medicine works, or whether thorium is a safer fuel than uranium.

    What the classifier attempts to circumvent are attempts to turn your home
    into a bomb lab with a few clever prompts. Normally, it would be questionable if an AI company could thread that needle, but the expertise of the NNSA
    should make the classifier different from a generic content moderation
    system. It understands the difference between explain fission and give me a step-by-step plan for uranium enrichment using garage supplies.

    This doesnt mean Claude was previously helping users design bombs. But it
    could help forestall any attempt to do so. Stick to asking about the way radiation can cure diseases or ask for creative sandwich ideas, not bomb blueprints.

    ======================================================================
    Link to news story: https://www.techradar.com/ai-platforms-assistants/claude/anthropic-will-nuke-y our-attempt-to-use-ai-to-build-a-nuke

    $$
    --- SBBSecho 3.28-Linux
    * Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)