Thinking Like the Enemy

Quantifying AI and Biological Risk

Game-Changing Impact

Malicious actors like terrorist organizations are looking for ways to leverage AI models for harm. AI safety benchmarks can serve as scalable evaluations to understand how real-world adversaries may use large language models (LLMs) for harm. The AI Safety Fund is advancing the cutting edge of understanding and mitigating risk to make the world safer from AI.

Since 2023, AISF has...

Awarded $10m+ to organizations advancing the state of AI safety research
Partnered with 23 leading research universities, nonprofits, and private companies.
Brought together cooperative funding from Anthropic, Google, Microsoft, and OpenAI, as well as philanthropic partners such as the Patrick J. McGovern Foundation, the David and Lucile Packard Foundation, Schmidt Sciences, and Jaan Tallinn.

Our Partnership

With the support of AISF, Frontier and its partners designed, developed and implemented a novel AI safety benchmark focused on risks from bioterrorism with 1,000+ questions covering a diverse range of biological agents and threat actors.

"Utilizing an international team of experts coupled with in-house talent, Frontier Design has successfully demonstrated their unique system for quantifying risks of potential biological weapons attacks. Application of the Frontier Design approach will increase the safety of publicly available AI tools as well as algorithms in development.”

Frontier Advisor Dr. John Fischer | Former Director of Chemical-Biological Defense Division, DHS | Senior Executive Service (retired)

The AI Safety Fund recognized that LLMs could elevate threats from bioterrorism and bioweapons. Benchmarks can serve as a useful tool to understand these risks, but existing benchmarks focused exclusively on scientific and technical components and failed to account for how a determined human actor might creatively misuse a model.

Unlike traditional biorisk tests that only evaluate scientific knowledge, our benchmark incorporates insights from the FBI, DHS, and creative visionaries (including sci-fi novelists and former Disney Imagineers) to simulate how threat actors actually think.

Services + Deliverables

Biorisk benchmark with 1,000+ items

Framework for assessing bacterial biological risk

Implementation guide to run and score the biorisk benchmark

Full documentation on arxiv to support further work
→ Click here to view

Our Services