Fixing AI in Conflict

We already have the tools.

AI tools are increasingly shaping modern warfare, but their effectiveness and safety remain deeply contested.

The U.S. military has used Project Maven to identify targets for strikes in Iraq, Syria, Yemen, and Ukraine. In Gaza, Israeli forces have relied on AI-generated intelligence to inform strikes that killed scores of civilians, and Claude was reportedly used by U.S. forces during a raid on Venezuela. The headlines are filled with examples, and a recent report by the Brennan Center for Justice at NYU Law details the extent of the deployment of these tools.

The bombing of a girls’ school in Iran stands out as the most frightening example of what can go wrong. But mistakes like these can be prevented by applying tools and policies that already exist.

WPS frameworks – grounded in UN Security Council Resolution 1325 and implemented through National Action Plans (NAPs) in over 100 countries – establish commitments for how conflict and security operations should account for women, protect civilian populations, and include affected civilian communities in decision-making. NATO has integrated WPS into doctrine. The U.S., UK, and most major allied defense establishments have signed NAPs that apply to their operations. Any AI system used by these decision makers should follow these laws. Our Secure Future’s research indicates they do not.

When Context Disappears, AI Loses Sight of Women in Conflict

OSF tested three leading AI models across 13 conflict scenarios at varying levels of contextual detail. When prompts named affected populations – displaced women, female ex-combatants, women-led organizations – the models produced passable analysis. But when those cues were removed, mimicking the sparse format of actual field reports, the models failed to surface any WPS analysis at all. However, when OSF used publicly available WPS data and policy frameworks to inform the models before the questions were asked, performance increased dramatically. For leaders, this would mean recommendations that reduce operational and strategic blind spots, and enable end-users to make faster, better-informed decisions.

Independent research reinforces these findings. In July 2025, IFIT’s AI on the Frontline study tested large language models (LLMs) on conflict resolution scenarios and found structural performance failures across the board – concluding that current AI models are not fit for high-stakes peace and security decision-making without significant intervention. Critically, a follow-up study found that adding a structured prompt – instructing models to follow basic conflict resolution best practices before responding – increased average scores by 65 percent.

Testing AI Claims Before Deployment Becomes a Security Imperative

As these systems become more capable and more integrated into operational decision-making, the gap will only widen unless proactive measures are taken. Military commanders are increasingly relying on AI........

© The Diplomat

visit website