Threat Advisories:
New Report Critical Threat Research : The Iranian Cyber War Intensifies! Download the Report
May 15, 2026

The Machine Found It First. The Machine Will Exploit It Next.

Critt Golden

Global Director Pre-Sales - NAM

For decades, the question behind every CVE has been “who found it, and how fast can attackers catch up?” As of May 12, 2026, the question has flipped. Machines found the bug. Machines will weaponize the next one. The race is no longer human-versus-human with a stopwatch.

Discovery crossed the line on May 1.

On May 1, 2026, XBOW — an autonomous offensive security platform built on coordinated AI agents — discovered a critical use-after-free vulnerability in the Exim mail transfer agent. The bug, dubbed Dead.Letter and assigned CVE-2026-45185, sits in Exim’s BDAT body parsing path during GnuTLS shutdown. It is unauthenticated. It is wormable. It received a CVSS score of 9.8. The maintainers acknowledged it on May 5, distros were notified on May 8, and the public disclosure landed on May 12 alongside the patched release of Exim 4.99.3.

That sequence of dates would be unremarkable except for one detail. The discovery was not made by a human security researcher. It was made by an autonomous system that, in its own words, runs “thousands of parallel agents, each starting fresh with a narrowly scoped objective.”

Dead.Letter is the first CVSS 9.8 in the CVE corpus where the discovering entity was not a person.

The same week, on May 11, Google’s Threat Intelligence Group published a report documenting the first in-the-wild zero-day exploit built with material AI assistance — a 2FA bypass in a widely deployed open-source administration tool, intended for mass exploitation by a named cybercrime group. Google patched it with the vendor before the campaign launched. The script bore unmistakable fingerprints of LLM authorship: pedagogical docstrings, a hallucinated CVSS score, textbook-clean Python structure.

One side of the lifecycle crossed the machine line on May 1. The other side crossed it on or before May 11. Same week. Both confirmed publicly within 24 hours of each other. The corpus shifted while most security teams were closing tickets from the prior cycle.

9.8
CVSS Score — Dead.Letter / CVE-2026-45185
11
Days from discovery to public disclosure
1st
CVSS 9.8 entry machine-discovered
1st
AI-built zero-day caught in the wild

XBOW and the Dead.Letter exploit

XBOW’s writeup of Dead.Letter is unusually candid. Federico Kirschbaum, who leads XBOW’s Security Lab, describes the seven-day disclosure window as a deliberate experiment: half the team worked the exploit chain by hand, with an LLM as assistant. The other half handed the vulnerability to a fully autonomous agent and walked away. Both produced working exploits. The autonomous run cracked Exim’s own allocator — not just generic glibc tricks — within the window.

The technical detail matters less than what it represents. A use-after-free in a mail server that has shipped on every Debian-derived distribution since 4.97. A write primitive of one single byte — a newline character into a freed memory region — escalated to unauthenticated remote code execution. The kind of finding that, twelve months ago, would have required a senior memory-corruption researcher and a quiet month. XBOW produced it during routine product testing.

XBOW has been telegraphing this for a year. The company reached #1 on HackerOne’s global leaderboard by submitting over 1,060 vulnerabilities on real-world production targets, all fully automated. In benchmarks, the platform broke a padding-oracle cryptographic implementation in 17 minutes. Matched a principal pentester’s 40-hour assessment in 28. The Dead.Letter entry into the CVE corpus is not the first thing XBOW has found. It is the first thing it has found that everyone has to look at — because the score is 9.8, because the surface is Exim, because the patch advisory carries a vendor name everyone knows.

The Shift, in One Line

Before May 1, the question was “can autonomous systems find real bugs?” After May 1, the question is “how many they’ve already found that we don’t yet know about.”

Exploitation crossed it days earlier.

Google’s May 11 disclosure is the other half of the story. The GTIG report identifies a zero-day exploit — a Python script that bypassed two-factor authentication on a popular open-source admin tool — and attributes its authorship with high confidence to an LLM. Not Gemini. Not Claude. The signatures GTIG cites are stylistic: educational docstrings on every function, structured textbook-style Python, a CVSS score the model fabricated rather than retrieved.

The bug itself is not a memory corruption flaw. It is a semantic logic flaw — a hardcoded trust assumption a developer left in the authentication path. This is exactly the class of vulnerability that frontier LLMs are unusually well-suited to spot: they read code the way a developer reads code, comparing intent against implementation, and they find the spots where the two diverge. Fuzzers and static analyzers are bad at this class. The model was not.

“For every zero-day we can trace back to AI, there are probably many more out there.”— John Hultquist, Google Threat Intelligence Group

The reason that sentence matters is that the May 11 case is the first documented instance — not the first instance. Detection capability lags adoption. By the time we have a confirmed corpus, the adversary has already moved through the early-adopter curve.

The Model We Built Defenses Around

Human-driven vulnerability lifecycle

A researcher finds the bug. A vendor patches. NIST enriches the CVE. Threat actors reverse-engineer the patch and build an exploit over days or weeks. Defenders use that window to inventory, prioritize, and patch. The window has been measured in weeks for two decades. Defender economics depend on it.

The Model That Now Applies

Machine-driven vulnerability lifecycle

An autonomous system finds the bug. The patch ships. The same class of system — operated by an adversary — has already done its own search and is building or has built an exploit. NVD enrichment is slower than both. The window is no longer measured in weeks. It is measured in the time between two AI runs.

“Who found it” is changing. “How fast it gets exploited” is changing next. Defender velocity has not changed.

A 14-day timeline that ends an era.

2002 – 2025
Every CVSS 9.8 in the corpus has a human name attached to its discovery.

From SMBGhost to Log4Shell to MOVEit, the discovery entity has been a researcher, a team, or a vendor’s internal security organization. Always human.

May 1, 2026
XBOW autonomously identifies the Dead.Letter UAF in Exim.

Discovered during product testing of a native-code vulnerability detection capability. Reported to the Exim maintainers the same day.

May 5, 2026
Exim maintainers acknowledge and confirm the fix in their private repo.

Four days from discovery to acknowledged patch. The maintainer turnaround is fast. The discovery side is faster.

May 8, 2026
Linux distros are notified ahead of public release.

Debian, Ubuntu, and downstream packagers begin staging updates. The vulnerability remains under embargo.

May 11, 2026
Google GTIG discloses the first AI-built zero-day exploit deployed in the wild.

A 2FA bypass on an open-source admin tool, planned for a mass exploitation campaign. Patched silently with the vendor before launch. The other half of the lifecycle crosses the line.

May 12, 2026
Dead.Letter goes public. CVE-2026-45185 is published. Exim 4.99.3 ships.

First CVSS 9.8 in the corpus with a non-human discoverer. Disclosed within 24 hours of GTIG’s exploitation report. The two-week window is the new shape of the cycle.

Why the defender side breaks first.

The economics of vulnerability management have always rested on an asymmetry that favored the defender. Finding a high-severity bug is expensive. Writing a reliable exploit is more expensive. Both required scarce human talent. Defenders did not have to be faster than attackers — they had to be faster than the bottleneck on the attacker’s side, which was almost always the human researcher.

That bottleneck is now under serious pressure on both ends. XBOW raised $155 million in Series C funding in early 2026 to scale autonomous offensive testing. Accenture, NVIDIA, Samsung, and SentinelOne are investors. The Microsoft Security Copilot integration shipped in March. The product is generally available. Whatever defenders think they know about the rate at which exploitable bugs surface, that rate is about to change as a function of compute spend, not researcher count.

On the adversarial side, GTIG’s report names the actors. APT45, North Korea’s military hacking group, is sending thousands of repetitive prompts to frontier models to analyze CVEs recursively and validate proof-of-concept exploits. UNC2814, the Chinese cyberespionage group, is jailbreaking Gemini with persona prompts to analyze TP-Link firmware. Russian operators are padding malware with AI-generated chaff to evade analyst review. The model has changed for everyone at once.

The Math Defenders Have to Confront

If discovery and exploitation both run at machine cadence, the only side of the lifecycle still running at human cadence is yours. The patch window does not adjust itself to your sprint schedule.

What changes for your program — now.

The actions below are not aspirational. They are the floor for any exposure-management program that intends to remain operational through the rest of 2026.

  • Assume parallel discovery. Treat every new CVE — even on obscure components — as if an autonomous adversary either has it or will have it within days. The reliable signal is no longer “is this exploitable in theory” but “what is the exposure window we can absorb.”
  • Stop sorting by CVSS alone. A 9.8 from an AI-discovered bug behaves operationally like any other 9.8, but the volume of 9.8s will rise. CVSS-as-queue-key collapses faster when the queue grows faster than your team. Exploit intelligence, exposure context, and business impact become the prioritization signal — CVSS is metadata.
  • Compress the patch SLA tier above “critical.” If your top tier reads “patch within 72 hours,” that was sized against a human-cadence lifecycle. The new tier — call it “machine cadence” — needs to sit at hours, not days, and should be reserved for the small subset of exposures where the asset, the access path, and the threat intelligence all align.
  • Audit signatureless detection coverage. If your detection still depends on the vendor patch landing in a feed, the AI-built exploit is already past you. Behavior-based detection, exposure-aware EDR, and compensating controls become the difference between known-exposed-and-protected and known-exposed-and-blind.
  • Make threat intelligence operational, not informational. Knowing that APT45 is using LLMs to analyze CVEs at scale is not a finding. Routing that knowledge into prioritization weight on the assets APT45 historically targets, that is operational. The gap between intel-as-newsletter and intel-as-input-to-your-stack is the gap between survival and audit-pass.
1/14

One CVE in the corpus, machine-discovered. Fourteen days between when the discovery side crossed the line and when the exploitation side crossed it. The next number is the one that matters — and no one is publishing it yet.

The question moves again.

In February we asked, “how long until your team knows you’re exposed?” In April, the NIST NVD pivot sharpened it to, “if the CVE is never enriched, does your team ever know at all?” In May, the question moves again.

The May Question

When the next CVSS 9.8 lands, and it was found by a machine, and the working exploit was written by a machine — does your program move at the speed of the lifecycle, or the speed of the calendar?

Dead.Letter is patchable. Exim 4.99.3 is available. The Google 2FA bypass was caught before launch. The systems worked this time. The next time — and there will be a next time, faster — the systems will need to have already been rebuilt around the assumption that both ends of the vulnerability lifecycle now run at machine speed.

The corpus shifted on May 12. The defender model has to shift before May ends.

Machine-cadence discovery and exploitation require machine-cadence exposure management.

See how Uni5 Xposure correlates exploit intelligence, exposure context, and business impact in a single signal — so your team prioritizes against the lifecycle that actually exists, not the one we used to defend against.

Request a Technical Walkthrough Learn More About Our Platform
Recent Resources

Dive into our library of resources for expert insights, guides, and in-depth analysis on maximizing Uni5 Xposure’s capabilities

Book a demo and find out more about how Hive Pro can double your operational efficiency

Book a Demo