Trust — ThreatAI

How ThreatAI works

One sentence: you ask for a check, we consult intel sources, we explain what we found.

The whole product is built around this loop:

You give us a target. A URL, a file hash, an IP address, a domain, an email, a password — depending on which scan you run.
We consult intel. Industry-standard sources we name explicitly: VirusTotal, AbuseIPDB, Have I Been Pwned, plus our own structural checks.
We compose a verdict. Risk level, plain-English explanation, the sources that contributed, the confidence we have.
You decide what to do. ThreatAI recommends; you act. We never reach into your accounts.

That's the whole product. The Investigation Agent (live for signed-in users) and continuous monitoring (rolling out) are layered on top of the same loop — they just chain more checks together and surface the reasoning end-to-end.

What data leaves your device

Every scan tells you exactly what we sent to whom. Here's the full table.

Phishing URL

Full URL → VirusTotal (base64-encoded URL ID)

File hash

Hash string only → VirusTotal. We never see your file.

IP reputation

IP address → AbuseIPDB

Network exposure

Hostname → our server (for DNS lookup + port probe). IP → AbuseIPDB.

Password breach

First 5 chars of SHA-1(password) → HIBP. Your password itself never leaves your device.

Email breach

Email address → HIBP. We hash the email to SHA-256 before it enters our logs.

AI advisor

Your typed question → Anthropic Claude. We do not include account identifiers.

Live investigations regenerate this same table per-run on the result page — the "What data left your device" section. Walk through the worked example at /agents/example-phishing-001, or sign in and run a real investigation from any scan.

How investigations work

An agent chains scans, weighs sources, and produces a verdict you can audit.

● TODAY Investigation Agent

An investigation is not magic. It's a structured loop:

Plan. Given a target and an initial signal, the agent picks the next scan that's likely to discriminate. Cheaper checks run first.
Run. Each step executes a single tool — one of our existing scans, a WHOIS lookup, or a federated-graph query when that ships. Every tool call is logged with input, output, latency.
Synthesize. The agent combines findings into a verdict. The model proposes; a deterministic policy checks the proposal before it's returned. The model never invents tools or sources.
Explain. A second pass writes the "why we reached this verdict" paragraph in plain English, separate from the model's internal reasoning.

Every step is shown on the result page. There is no hidden reasoning, no off-the-record tool call, no "trust us" layer. Walk through a worked example at /agents/example-phishing-001.

How confidence scores work

We compose confidence from real signals, then calibrate against actual outcomes.

The model does not get to make up its confidence. We compose it from auditable inputs:

confidence =  
  w₁ · ioc_consensus          // how many independent sources agreed
+ w₂ · source_reputation       // per-source historical accuracy
+ w₃ · historical_accuracy     // agent's past accuracy on similar cases
+ w₄ · (1 − model_uncertainty) // logprob-derived, not self-rated
+ w₅ · graph_support           // corroboration from federated graph (later)
− penalty · novelty            // brand-new indicators get a haircut

The UI shows a bucket — Low / Medium / High / Very High — not a raw percentage. Raw numbers create false precision when calibration is imperfect.

○ ROLLING OUT Calibration loop Weights are recalibrated weekly against actual outcomes using isotonic regression. Brier score and reliability diagrams gate every prompt or model deploy.

What we store

Concrete tables, concrete fields, all scoped to your account.

Your account's data lives in Supabase Postgres with row-level security. Every table is queryable only by your authenticated user — even our own backend code can't read across users without elevated credentials.

scan_history● TODAY

One row per scan you run: scan type, target (often hashed), risk level, full result JSON.

usage_counters● TODAY

Per-month counts of scans you've run, per scan type. Resets monthly.

profiles● TODAY

Your account profile + subscription tier + Stripe IDs. No payment details — Stripe holds those.

agent_runs● TODAY

One row per investigation: verdict, steps, sources, data activity, budget. Full trace; per-row delete rolling out in a follow-up.

monitored_assets○ ROLLING OUT

The assets you opted into watching. Hashed where reasonable (e.g. monitored emails).

agent_alerts○ ROLLING OUT

Findings the monitor surfaced. Severity, evidence, read/snoozed state.

agent_memory○ ROLLING OUT

Two classes only in H1: explicit preferences (e.g. domains you own) and corrections (false-positive overrides). Nothing inferred.

We do notstore: your raw passwords, raw emails (only SHA-256 prefixes in logs), payment card numbers, scan content from files (only their hashes), or anything inferred about you that you didn't explicitly tell us.

Your data controls

Per-row delete, full export, full wipe. Yours to invoke any time.

● TODAY Account-wide delete Delete your account from settings; we purge all your rows within 24h.
○ ROLLING OUT Per-row delete One-tap delete on any scan, investigation, alert, or memory entry. Built into every result page.
○ ROLLING OUT Full export Download a JSON archive of everything we store about you.
○ ROLLING OUT Per-feature opt-out Toggle memory, future federated-graph contribution, or analytics off independently — not all-or-nothing.

We avoid retention as a default. Where data is kept, it's for your own audit trail. We do not sell, share, or monetize your scan history.

Retention policy

Time-bounded by tier. Automated, not aspirational.

scan_history

30 days

1 year

3 years

agent_runs (full trace)

90 days

1 year

agent_alerts (read)

1 year

agent_memory

indefinite

monitored_assets

n/a

while present

Verdict summaries

indefinite

RETENTION BY TIER

SCOUT

SENTINEL

FORTRESS

Retention is enforced by a nightly job that deletes or anonymizes rows past their TTL. Indefinite means kept until you delete. You can delete anything sooner with the per-row controls.

AI safety principles

Boundaries on what the AI is allowed to do, why, and how we enforce them.

Recommend, don't act. The AI never reaches into your accounts. If we ever add action-taking integrations, they will be opt-in, scoped, reversible, human-in-the-loop by default.
Tool calls are gated. The model proposes which tool to run; a deterministic policy checks the call (target type, scope, rate limit) before it executes. The model cannot invent tools.
Passwords never reach the model. Password checks use HIBP's k-anonymity API. Only the first 5 characters of a SHA-1 hash leave the browser. The model never sees the password, the full hash, or the breach count.
Redaction at write time. Logs strip any field that looks like a credential (password, token, secret, authorization, cookie) at the logger boundary, before bytes hit disk.
Prompt-injection defense. User-supplied text is wrapped in tags and the system prompt distrusts content inside. Tool arguments are validated post-model-output, never trusted blindly.
Cost ceilings. Every agent run has a hard budget: max iterations, max tokens, max external API calls. Runaway loops terminate with budget_exceeded rather than silently burning tokens.

Why we don't auto-take actions

Autonomy without earned trust is a liability — even when the user wants it.

The industry trend is to give AI agents your credentials and let them "fix things." That promise sounds great in a demo and breaks badly in production. A few reasons we won't do it without explicit, scoped consent:

Some actions can't be cleanly undone. A credit freeze, a session invalidation that breaks live work, a password rotation that locks dependent apps — these have real human cost when they're wrong.
Confidence is never 100%. Even at 90% confidence, one in ten autonomous actions on a popular product means thousands of false-positive lockouts per week. That destroys trust faster than any feature can rebuild it.
Liability follows action. The moment ThreatAI changes something on your behalf, we're on the hook for what changed. We won't take that on lightly, and we won't hide it behind a checkbox.

If we ever ship action-taking, it will be:

Per-integration OAuth with a granular consent screen.
Human-in-the-loop by default — even after you've granted permission. Actions need a tap to fire.
Earned autonomy — after N consecutive approvals of the same action class without modification, we'll ask if you want to auto-execute next time. You stay in control.
Reversibility classified — fully reversible / reversible-with-friction / effectively irreversible. The third category is HITL-only, forever.

Until then: we tell you what to do, and you do it. That's the deal.

How ThreatAI handles your data.

How ThreatAI works

What data leaves your device

How investigations work

How confidence scores work

What we store

Your data controls

Retention policy

AI safety principles

Why we don't auto-take actions

Found something on this page that's wrong, vague, or missing?