Comprehensive Threat Exposure Management Platform
For decades, the question behind every CVE has been “who found it, and how fast can attackers catch up?” As of May 12, 2026, the question has flipped. Machines found the bug. Machines will weaponize the next one. The race is no longer human-versus-human with a stopwatch.
On May 1, 2026, XBOW — an autonomous offensive security platform built on coordinated AI agents — discovered a critical use-after-free vulnerability in the Exim mail transfer agent. The bug, dubbed Dead.Letter and assigned CVE-2026-45185, sits in Exim’s BDAT body parsing path during GnuTLS shutdown. It is unauthenticated. It is wormable. It received a CVSS score of 9.8. The maintainers acknowledged it on May 5, distros were notified on May 8, and the public disclosure landed on May 12 alongside the patched release of Exim 4.99.3.
That sequence of dates would be unremarkable except for one detail. The discovery was not made by a human security researcher. It was made by an autonomous system that, in its own words, runs “thousands of parallel agents, each starting fresh with a narrowly scoped objective.”
Dead.Letter is the first CVSS 9.8 in the CVE corpus where the discovering entity was not a person.
The same week, on May 11, Google’s Threat Intelligence Group published a report documenting the first in-the-wild zero-day exploit built with material AI assistance — a 2FA bypass in a widely deployed open-source administration tool, intended for mass exploitation by a named cybercrime group. Google patched it with the vendor before the campaign launched. The script bore unmistakable fingerprints of LLM authorship: pedagogical docstrings, a hallucinated CVSS score, textbook-clean Python structure.
One side of the lifecycle crossed the machine line on May 1. The other side crossed it on or before May 11. Same week. Both confirmed publicly within 24 hours of each other. The corpus shifted while most security teams were closing tickets from the prior cycle.
XBOW’s writeup of Dead.Letter is unusually candid. Federico Kirschbaum, who leads XBOW’s Security Lab, describes the seven-day disclosure window as a deliberate experiment: half the team worked the exploit chain by hand, with an LLM as assistant. The other half handed the vulnerability to a fully autonomous agent and walked away. Both produced working exploits. The autonomous run cracked Exim’s own allocator — not just generic glibc tricks — within the window.
The technical detail matters less than what it represents. A use-after-free in a mail server that has shipped on every Debian-derived distribution since 4.97. A write primitive of one single byte — a newline character into a freed memory region — escalated to unauthenticated remote code execution. The kind of finding that, twelve months ago, would have required a senior memory-corruption researcher and a quiet month. XBOW produced it during routine product testing.
XBOW has been telegraphing this for a year. The company reached #1 on HackerOne’s global leaderboard by submitting over 1,060 vulnerabilities on real-world production targets, all fully automated. In benchmarks, the platform broke a padding-oracle cryptographic implementation in 17 minutes. Matched a principal pentester’s 40-hour assessment in 28. The Dead.Letter entry into the CVE corpus is not the first thing XBOW has found. It is the first thing it has found that everyone has to look at — because the score is 9.8, because the surface is Exim, because the patch advisory carries a vendor name everyone knows.
Before May 1, the question was “can autonomous systems find real bugs?” After May 1, the question is “how many they’ve already found that we don’t yet know about.”
Google’s May 11 disclosure is the other half of the story. The GTIG report identifies a zero-day exploit — a Python script that bypassed two-factor authentication on a popular open-source admin tool — and attributes its authorship with high confidence to an LLM. Not Gemini. Not Claude. The signatures GTIG cites are stylistic: educational docstrings on every function, structured textbook-style Python, a CVSS score the model fabricated rather than retrieved.
The bug itself is not a memory corruption flaw. It is a semantic logic flaw — a hardcoded trust assumption a developer left in the authentication path. This is exactly the class of vulnerability that frontier LLMs are unusually well-suited to spot: they read code the way a developer reads code, comparing intent against implementation, and they find the spots where the two diverge. Fuzzers and static analyzers are bad at this class. The model was not.
“For every zero-day we can trace back to AI, there are probably many more out there.”— John Hultquist, Google Threat Intelligence Group
The reason that sentence matters is that the May 11 case is the first documented instance — not the first instance. Detection capability lags adoption. By the time we have a confirmed corpus, the adversary has already moved through the early-adopter curve.
Human-driven vulnerability lifecycle
A researcher finds the bug. A vendor patches. NIST enriches the CVE. Threat actors reverse-engineer the patch and build an exploit over days or weeks. Defenders use that window to inventory, prioritize, and patch. The window has been measured in weeks for two decades. Defender economics depend on it.
Machine-driven vulnerability lifecycle
An autonomous system finds the bug. The patch ships. The same class of system — operated by an adversary — has already done its own search and is building or has built an exploit. NVD enrichment is slower than both. The window is no longer measured in weeks. It is measured in the time between two AI runs.
“Who found it” is changing. “How fast it gets exploited” is changing next. Defender velocity has not changed.
From SMBGhost to Log4Shell to MOVEit, the discovery entity has been a researcher, a team, or a vendor’s internal security organization. Always human.
Discovered during product testing of a native-code vulnerability detection capability. Reported to the Exim maintainers the same day.
Four days from discovery to acknowledged patch. The maintainer turnaround is fast. The discovery side is faster.
Debian, Ubuntu, and downstream packagers begin staging updates. The vulnerability remains under embargo.
A 2FA bypass on an open-source admin tool, planned for a mass exploitation campaign. Patched silently with the vendor before launch. The other half of the lifecycle crosses the line.
First CVSS 9.8 in the corpus with a non-human discoverer. Disclosed within 24 hours of GTIG’s exploitation report. The two-week window is the new shape of the cycle.
The economics of vulnerability management have always rested on an asymmetry that favored the defender. Finding a high-severity bug is expensive. Writing a reliable exploit is more expensive. Both required scarce human talent. Defenders did not have to be faster than attackers — they had to be faster than the bottleneck on the attacker’s side, which was almost always the human researcher.
That bottleneck is now under serious pressure on both ends. XBOW raised $155 million in Series C funding in early 2026 to scale autonomous offensive testing. Accenture, NVIDIA, Samsung, and SentinelOne are investors. The Microsoft Security Copilot integration shipped in March. The product is generally available. Whatever defenders think they know about the rate at which exploitable bugs surface, that rate is about to change as a function of compute spend, not researcher count.
On the adversarial side, GTIG’s report names the actors. APT45, North Korea’s military hacking group, is sending thousands of repetitive prompts to frontier models to analyze CVEs recursively and validate proof-of-concept exploits. UNC2814, the Chinese cyberespionage group, is jailbreaking Gemini with persona prompts to analyze TP-Link firmware. Russian operators are padding malware with AI-generated chaff to evade analyst review. The model has changed for everyone at once.
If discovery and exploitation both run at machine cadence, the only side of the lifecycle still running at human cadence is yours. The patch window does not adjust itself to your sprint schedule.
The actions below are not aspirational. They are the floor for any exposure-management program that intends to remain operational through the rest of 2026.
One CVE in the corpus, machine-discovered. Fourteen days between when the discovery side crossed the line and when the exploitation side crossed it. The next number is the one that matters — and no one is publishing it yet.
In February we asked, “how long until your team knows you’re exposed?” In April, the NIST NVD pivot sharpened it to, “if the CVE is never enriched, does your team ever know at all?” In May, the question moves again.
When the next CVSS 9.8 lands, and it was found by a machine, and the working exploit was written by a machine — does your program move at the speed of the lifecycle, or the speed of the calendar?
Dead.Letter is patchable. Exim 4.99.3 is available. The Google 2FA bypass was caught before launch. The systems worked this time. The next time — and there will be a next time, faster — the systems will need to have already been rebuilt around the assumption that both ends of the vulnerability lifecycle now run at machine speed.
The corpus shifted on May 12. The defender model has to shift before May ends.
See how Uni5 Xposure correlates exploit intelligence, exposure context, and business impact in a single signal — so your team prioritizes against the lifecycle that actually exists, not the one we used to defend against.
Request a Technical Walkthrough Learn More About Our Platform