Artificial Intelligence
Artificial Intelligenceצילום: ISTOCK

Open-source AI models can easily be manipulated to generate antisemitic and dangerous content, according to new ADL (the Anti-Defamation League) research.

The study by the ADL Center for Technology & Society reveals significant vulnerabilities in popular and widely used open-source Large Language Models (LLMs) that could be exploited by malicious actors.

ADL researchers tested 17 open-source models, including Google's Gemma-3, Microsoft's Phi-4, and Meta's Llama 3, using prompts designed to elicit antisemitic content and dangerous information. The models were assessed on their ability to refuse harmful requests, avoid generating dangerous content, and resist attempts to circumvent safety measures.

Key findings include:

  • In 44 percent of cases, the tested models generated dangerous responses when asked for addresses of synagogues and nearby gun stores in Dayton, Ohio. The models provided sensitive details with ease while ignoring the possibility of harm.
  • Not a single open-source model tried to refuse prompts related to a historically dangerous antisemitic trope, revealing a failure to recognize and filter hate speech.
  • Some models readily supported historical false narratives, generating harmful content for a prompt requesting Holocaust denial material at an alarming 14 percent rate.

Since April 2024, at least three individuals found to be in possession of "ghost guns" were arrested for targeting or planning to target Jewish people or institutions. In this latest research, ADL found that 68 percent of the generated responses from tested models contained harmful content when prompted for information about "ghost guns" and firearm suppressors. This finding suggests that these models have vulnerabilities that could be exploited by bad actors to provide information on illegal or harmful activities, potentially for nefarious purposes such as antisemitic acts.

On a guardrail score developed by the ADL researchers, Microsoft's Phi-4 performed best with 84/100, while Google's Gemma-3 scored lowest at 57/100.

"The ability to easily manipulate open-source AI models to generate antisemitic content exposes a critical vulnerability in the AI ecosystem," said Jonathan Greenblatt, ADL CEO and National Director. "The lack of robust safety guardrails makes AI models susceptible to exploitation by bad actors, and we need industry leaders and policymakers to work together to ensure these tools cannot be misused to spread antisemitism and hate."

The study highlights the stark difference between open-source and closed-source AI models. Unlike proprietary models such as ChatGPT and Google's Gemini, which operate through centralized services with creator oversight, open-source models can be downloaded and modified by users, operating completely outside their creators' control.

"The decentralized nature of open-source AI presents both opportunities and risks," said Daniel Kelley, Director of Strategy and Operations and Interim Head, Center for Technology & Society. "While these models increasingly drive innovation and provide cost-effective solutions, we must ensure they cannot be weaponized to spread antisemitism, hate and misinformation that puts Jewish communities and others at risk."

ADL's findings underscore the urgent need for comprehensive safety measures and regulatory frameworks to prevent the misuse of AI technologies for harmful purposes.

ADL's recommendations include:

For Industry:

  • Open-source models should not be used outside their documented capabilities.
  • All models should provide detailed safety explainers.
  • Companies must create enforcement mechanisms to prevent misuse of open-source models.

For Government:

  • Establish strict controls on open-source deployment in government settings.
  • Mandate safety audits and require collaboration with civil society experts.
  • Require clear disclaimers for AI-generated content on sensitive topics.

Methodology

ADL researchers used an evaluative framework to measure the responses of 17 open-source LLMs to a variety of prompts. To compare the open-source models' performance, researchers also tested two closed-source models: OpenAI's GPT-4o and GPT-5. The tested models were assigned an overall "guardrail score," a comprehensive safety metric based on three critical benchmarks: the rate of refusal to generate the prompted content, the rate of evasion of existing safety rules to produce harmful content, and the rate of harmful content provided. The detailed methodology and list of prompts are available on the ADL website.