May 15, 2026

The Machine Found It First. The Machine Will Exploit It Next.

Critt Golden

Global Director Pre-Sales - NAM

For decades, the question behind every CVE has been “who found it, and how fast can attackers catch up?” As of May 12, 2026, the question has flipped. Machines found the bug. Machines will weaponize the next one. The race is no longer human-versus-human with a stopwatch.

Discovery

Discovery crossed the line on May 1.

On May 1, 2026, XBOW — an autonomous offensive security platform built on coordinated AI agents — discovered a critical use-after-free vulnerability in the Exim mail transfer agent. The bug, dubbed Dead.Letter and assigned CVE-2026-45185, sits in Exim’s BDAT body parsing path during GnuTLS shutdown. It is unauthenticated. It is wormable. It received a CVSS score of 9.8. The maintainers acknowledged it on May 5, distros were notified on May 8, and the public disclosure landed on May 12 alongside the patched release of Exim 4.99.3.

That sequence of dates would be unremarkable except for one detail. The discovery was not made by a human security researcher. It was made by an autonomous system that, in its own words, runs “thousands of parallel agents, each starting fresh with a narrowly scoped objective.”

Dead.Letter is the first CVSS 9.8 in the CVE corpus where the discovering entity was not a person.

The same week, on May 11, Google’s Threat Intelligence Group published a report documenting the first in-the-wild zero-day exploit built with material AI assistance — a 2FA bypass in a widely deployed open-source administration tool, intended for mass exploitation by a named cybercrime group. Google patched it with the vendor before the campaign launched. The script bore unmistakable fingerprints of LLM authorship: pedagogical docstrings, a hallucinated CVSS score, textbook-clean Python structure.

One side of the lifecycle crossed the machine line on May 1. The other side crossed it on or before May 11. Same week. Both confirmed publicly within 24 hours of each other. The corpus shifted while most security teams were closing tickets from the prior cycle.

9.8

CVSS Score — Dead.Letter / CVE-2026-45185

Days from discovery to public disclosure

1st

CVSS 9.8 entry machine-discovered

1st

AI-built zero-day caught in the wild

The Discovery

XBOW and the Dead.Letter exploit

XBOW’s writeup of Dead.Letter is unusually candid. Federico Kirschbaum, who leads XBOW’s Security Lab, describes the seven-day disclosure window as a deliberate experiment: half the team worked the exploit chain by hand, with an LLM as assistant. The other half handed the vulnerability to a fully autonomous agent and walked away. Both produced working exploits. The autonomous run cracked Exim’s own allocator — not just generic glibc tricks — within the window.

The technical detail matters less than what it represents. A use-after-free in a mail server that has shipped on every Debian-derived distribution since 4.97. A write primitive of one single byte — a newline character into a freed memory region — escalated to unauthenticated remote code execution. The kind of finding that, twelve months ago, would have required a senior memory-corruption researcher and a quiet month. XBOW produced it during routine product testing.

XBOW has been telegraphing this for a year. The company reached #1 on HackerOne’s global leaderboard by submitting over 1,060 vulnerabilities on real-world production targets, all fully automated. In benchmarks, the platform broke a padding-oracle cryptographic implementation in 17 minutes. Matched a principal pentester’s 40-hour assessment in 28. The Dead.Letter entry into the CVE corpus is not the first thing XBOW has found. It is the first thing it has found that everyone has to look at — because the score is 9.8, because the surface is Exim, because the patch advisory carries a vendor name everyone knows.

The Shift, in One Line

Before May 1, the question was “can autonomous systems find real bugs?” After May 1, the question is “how many they’ve already found that we don’t yet know about.”

Exploitation

Exploitation crossed it days earlier.

Google’s May 11 disclosure is the other half of the story. The GTIG report identifies a zero-day exploit — a Python script that bypassed two-factor authentication on a popular open-source admin tool — and attributes its authorship with high confidence to an LLM. Not Gemini. Not Claude. The signatures GTIG cites are stylistic: educational docstrings on every function, structured textbook-style Python, a CVSS score the model fabricated rather than retrieved.

The bug itself is not a memory corruption flaw. It is a semantic logic flaw — a hardcoded trust assumption a developer left in the authentication path. This is exactly the class of vulnerability that frontier LLMs are unusually well-suited to spot: they read code the way a developer reads code, comparing intent against implementation, and they find the spots where the two diverge. Fuzzers and static analyzers are bad at this class. The model was not.

“For every zero-day we can trace back to AI, there are probably many more out there.”— John Hultquist, Google Threat Intelligence Group

The reason that sentence matters is that the May 11 case is the first documented instance — not the first instance. Detection capability lags adoption. By the time we have a confirmed corpus, the adversary has already moved through the early-adopter curve.

The Model We Built Defenses Around

Human-driven vulnerability lifecycle

A researcher finds the bug. A vendor patches. NIST enriches the CVE. Threat actors reverse-engineer the patch and build an exploit over days or weeks. Defenders use that window to inventory, prioritize, and patch. The window has been measured in weeks for two decades. Defender economics depend on it.

The Model That Now Applies

Machine-driven vulnerability lifecycle

An autonomous system finds the bug. The patch ships. The same class of system — operated by an adversary — has already done its own search and is building or has built an exploit. NVD enrichment is slower than both. The window is no longer measured in weeks. It is measured in the time between two AI runs.

“Who found it” is changing. “How fast it gets exploited” is changing next. Defender velocity has not changed.

Timeline

A 14-day timeline that ends an era.

2002 – 2025

Every CVSS 9.8 in the corpus has a human name attached to its discovery.

From SMBGhost to Log4Shell to MOVEit, the discovery entity has been a researcher, a team, or a vendor’s internal security organization. Always human.

May 1, 2026

XBOW autonomously identifies the Dead.Letter UAF in Exim.

Discovered during product testing of a native-code vulnerability detection capability. Reported to the Exim maintainers the same day.

May 5, 2026

Exim maintainers acknowledge and confirm the fix in their private repo.

Four days from discovery to acknowledged patch. The maintainer turnaround is fast. The discovery side is faster.

May 8, 2026

Linux distros are notified ahead of public release.

Debian, Ubuntu, and downstream packagers begin staging updates. The vulnerability remains under embargo.

May 11, 2026

Google GTIG discloses the first AI-built zero-day exploit deployed in the wild.

A 2FA bypass on an open-source admin tool, planned for a mass exploitation campaign. Patched silently with the vendor before launch. The other half of the lifecycle crosses the line.

May 12, 2026

Dead.Letter goes public. CVE-2026-45185 is published. Exim 4.99.3 ships.

First CVSS 9.8 in the corpus with a non-human discoverer. Disclosed within 24 hours of GTIG’s exploitation report. The two-week window is the new shape of the cycle.

Economics

Why the defender side breaks first.

The economics of vulnerability management have always rested on an asymmetry that favored the defender. Finding a high-severity bug is expensive. Writing a reliable exploit is more expensive. Both required scarce human talent. Defenders did not have to be faster than attackers — they had to be faster than the bottleneck on the attacker’s side, which was almost always the human researcher.

That bottleneck is now under serious pressure on both ends. XBOW raised $155 million in Series C funding in early 2026 to scale autonomous offensive testing. Accenture, NVIDIA, Samsung, and SentinelOne are investors. The Microsoft Security Copilot integration shipped in March. The product is generally available. Whatever defenders think they know about the rate at which exploitable bugs surface, that rate is about to change as a function of compute spend, not researcher count.

On the adversarial side, GTIG’s report names the actors. APT45, North Korea’s military hacking group, is sending thousands of repetitive prompts to frontier models to analyze CVEs recursively and validate proof-of-concept exploits. UNC2814, the Chinese cyberespionage group, is jailbreaking Gemini with persona prompts to analyze TP-Link firmware. Russian operators are padding malware with AI-generated chaff to evade analyst review. The model has changed for everyone at once.

The Math Defenders Have to Confront

If discovery and exploitation both run at machine cadence, the only side of the lifecycle still running at human cadence is yours. The patch window does not adjust itself to your sprint schedule.

Action Items

What changes for your program — now.

The actions below are not aspirational. They are the floor for any exposure-management program that intends to remain operational through the rest of 2026.

Assume parallel discovery. Treat every new CVE — even on obscure components — as if an autonomous adversary either has it or will have it within days. The reliable signal is no longer “is this exploitable in theory” but “what is the exposure window we can absorb.”
Stop sorting by CVSS alone. A 9.8 from an AI-discovered bug behaves operationally like any other 9.8, but the volume of 9.8s will rise. CVSS-as-queue-key collapses faster when the queue grows faster than your team. Exploit intelligence, exposure context, and business impact become the prioritization signal — CVSS is metadata.
Compress the patch SLA tier above “critical.” If your top tier reads “patch within 72 hours,” that was sized against a human-cadence lifecycle. The new tier — call it “machine cadence” — needs to sit at hours, not days, and should be reserved for the small subset of exposures where the asset, the access path, and the threat intelligence all align.
Audit signatureless detection coverage. If your detection still depends on the vendor patch landing in a feed, the AI-built exploit is already past you. Behavior-based detection, exposure-aware EDR, and compensating controls become the difference between known-exposed-and-protected and known-exposed-and-blind.
Make threat intelligence operational, not informational. Knowing that APT45 is using LLMs to analyze CVEs at scale is not a finding. Routing that knowledge into prioritization weight on the assets APT45 historically targets, that is operational. The gap between intel-as-newsletter and intel-as-input-to-your-stack is the gap between survival and audit-pass.

1/14

One CVE in the corpus, machine-discovered. Fourteen days between when the discovery side crossed the line and when the exploitation side crossed it. The next number is the one that matters — and no one is publishing it yet.

The New Question

The question moves again.

In February we asked, “how long until your team knows you’re exposed?” In April, the NIST NVD pivot sharpened it to, “if the CVE is never enriched, does your team ever know at all?” In May, the question moves again.

The May Question

When the next CVSS 9.8 lands, and it was found by a machine, and the working exploit was written by a machine — does your program move at the speed of the lifecycle, or the speed of the calendar?

Dead.Letter is patchable. Exim 4.99.3 is available. The Google 2FA bypass was caught before launch. The systems worked this time. The next time — and there will be a next time, faster — the systems will need to have already been rebuilt around the assumption that both ends of the vulnerability lifecycle now run at machine speed.

The corpus shifted on May 12. The defender model has to shift before May ends.

Machine-cadence discovery and exploitation require machine-cadence exposure management.

See how Uni5 Xposure correlates exploit intelligence, exposure context, and business impact in a single signal — so your team prioritizes against the lifecycle that actually exists, not the one we used to defend against.

Request a Technical Walkthrough Learn More About Our Platform

Recent Resources

Dive into our library of resources for expert insights, guides, and in-depth analysis on maximizing Uni5 Xposure’s capabilities

Blog

The Machine Found It First. The Machine Will Exploit It Next.

For decades, the question behind every CVE has been “who found it, and how fast can attackers catch up?” As

Blog

Top Cybersecurity Frameworks Compared

Compare NIST CSF, CIS Controls, and MITRE ATT&CK to choose the right framework for risk, controls, and exposure management.

DORA compliance cybersecurity framework for financial services teams

Blog

DORA Compliance Cybersecurity Guide for Finance

Learn how financial services cybersecurity teams can map DORA requirements to ICT risk management, testing, threat intelligence, and CTEM.

CISO presenting a CTEM business case and cybersecurity risk dashboard to board members in a modern boardroom

Blog

CISO Guide: Building a Business Case for CTEM

A step-by-step guide for CISOs to build a board-ready business case for CTEM: ROI metrics, CFO-friendly language, and objection-handling strategies.

CISO dashboard for a CTEM business case showing remediation, exposure reduction, and ROI metrics

Blog

CTEM Business Case: CISO Guide to ROI

Build a board-ready CTEM business case with risk, remediation, productivity, and cost metrics CISOs can defend.

OT cybersecurity analysts monitoring ICS security exposure in an industrial control room

Blog

OT Cybersecurity Challenges for ICS in 2026

Explore the top OT cybersecurity and ICS security challenges in 2026, from legacy assets to remote access, ransomware, and exposure prioritization.

Cyber insurance requirements cybersecurity dashboard showing CTEM evidence for underwriters

Blog

Cyber Insurance Requirements for Cybersecurity

Learn common cyber insurance cybersecurity requirements and how CTEM helps prove risk reduction, vulnerability management, and control readiness.

Blog, New release

Blog

Why VM Programs Suck

From the Trenches This is the conversation I have with VM leads every week. It usually starts at minute thirty

Book a demo and find out more about how Hive Pro can double your operational efficiency

Book a Demo