Athena · A Treatise on Machine Auditing Vol. I · Ed. 1.4 · MMXXVI

Vision Systems for vulnerability discovery at the speed of code being written.

Vulnerability
scanner that
thinks.

Athena reads your entire repository for hours and reasons about it from first principles to find vulnerabilities beyond pattern matching approaches.

A run, unredacted. The system argues with itself — and shows its working.

⸻ Folio XVII
Hypotheses under examination scan · 0x4f2c
Finding — surviving cross-examination confirmed · 03:42
Probatum

Authentication bypass on account deletion

services/auth/verify.py  ·  lines 184–207  ·  severity high
Reasoning The delete handler short-circuits verification when token.kind == "admin", trusting the claim before verify_signature() runs. Any signed-looking blob with that claim passes. Reproduction POST /v1/accounts/{id}/delete with a forged token of kind admin — see repro script in the report.
Critique log — “Refutation attempted four ways: mTLS, WAF rule, pre-handler middleware, version history. None are invoked on the delete path. Fault confirmed.”
Probatum

Cross-tenant invoice access via warm cache

services/billing/invoices.py  ·  lines 42–71  ·  severity high
Reasoning Invoice lookup checks ownership only on the DB path. Cache hits skip the check entirely — a warmed cache exposes any invoice by incrementing the id. Reproduction Request /invoices/{your-id} to warm the cache, then request /invoices/{neighbor-id}. The second call returns without an ownership assertion.
Critique log — “Considered: signed URL, tenant-scoped cache key, gateway-level ACL. The cache key is plain (id) and the ACL lives below the cache. Exploit holds.”
Refutatum

SSRF via unrestricted URL fetch — not exploitable

services/preview/fetch.py  ·  lines 88–130  ·  verdict refuted
Hypothesis The preview endpoint appeared to accept arbitrary URLs and forward them server-side, suggesting SSRF into the internal metadata service. Why it fails Outbound requests are routed through an egress proxy that blocks RFC1918 ranges, 169.254.*, and the metadata IP before the socket opens. The check runs before DNS resolution, so rebinding is closed too.
Critique log — “Attempted four vectors: direct IP, hostname to internal, DNS rebinding, redirect chain. Egress proxy rejects all of them at L4. No exploit.”
Probatum

Privilege escalation via mass-assignment on member update

services/workspace/members.py  ·  lines 204–238  ·  severity critical
Reasoning The member-update handler spreads the request body into the model without a field allowlist. The role field is writable even though the UI never exposes it — any member can PATCH themselves to owner. Reproduction As a regular member, PATCH your own membership with {"role":"owner"}. The server returns 200 and the role is persisted.
Critique log — “Looked for: middleware scrubbing, serializer allowlist, audit-log trigger. None reject the role write. Confirmed on staging replica.”
Refutatum

Second-order XSS via markdown cache — not exploitable

services/render/markdown.py  ·  lines 156–190  ·  verdict refuted
Hypothesis The cache appeared to store raw rendered HTML and run the sanitizer on read, which would let a poisoned entry serve unsanitized markup to the next reader. Why it fails Read tracing shows the sanitizer runs before the cache write, not after the read. Every cached value is already scrubbed. Poisoning the cache with raw markup is not reachable from any write path.
Critique log — “Traced every writer into the cache. All paths pass through sanitize(). Hypothesis falsified.”
⸻ Provenance

Athena has already found critical vulnerabilities in the code running half the internet.

  • Redisin-memory store
  • libuvasync I/O runtime
  • ffmpegmedia pipeline
  • Reactui framework
  • Next.jsapp framework
Reasoning generalizes. Rexion engineers harnesses that extract maximum performance from raw model reasoning while being agnostic to language, framework, or stack. The model is a commodity — the harness is the moat.
§ ii  ·  the method

What a good auditor does — at a scale that doesn't sleep.

fol. i

Read the whole repository.

Every module, every call graph, every control-flow path, every piece of business logic — reconstructed as a working model of behavior.

fol. ii

Form hypotheses.

For each component, Athena reasons about what it should do and what it might do wrong. An understanding is built that goes deeper than the one held by the author.

fol. iii

Argue against itself.

A separate critique agent demands evidence, builds counter-examples, reads the tests the first agent skipped. Any finding that cannot withstand cross-examination is dropped. This is how the false-positive rate stays near zero.

fol. iv

PoC || GTFO.

Every finding is reported with a working repro — or it isn't reported at all.

The scanner and the scholar.

22% of merged code is now AI-authored. Daily AI users ship 60% more PRs. Pattern matchers were already behind — now they're drowning.

Legacy SAST

Matches shapes it has already seen.

  • Rule-based pattern matching against known CWEs
  • Blind to business-logic flaws — the cause of most real breaches
  • Ships a noisy queue; triage becomes your second job
  • Gets relatively worse as the model frontier advances
Athena

Reads the code and reasons about what it does.

  • Builds a model of behavior, then asks how that behavior fails
  • Catches bug classes no rulebook has named yet
  • Refutes its own findings — near-zero false-positive rate
  • Gets stronger with every frontier-model release
· · · FIN · · ·

Run Athena on your codebase.

Point Athena at a repository and receive, within a day, a short report of every fault that survives internal critique — with reasoning and a working reproduction. Access is limited while we scale.