🛡️ FabricationGuard — live demo

Real Qwen3.6-27B inference + probe. No mocks. Type any prompt and watch the activation-probe score the prompt for fabrication risk in real time.

Built by OpenInterp · pip install openinterp · GitHub · Probe artifact

⚠️ Cold start may take 3-5 minutes on first request (model load). Subsequent requests are fast (5-15s including model generation).

Prompt

Mode

detect = score only · warn = flag · abstain = replace high-score with uncertainty

detect warn abstain

Examples

Prompt	Mode

🟢 ░░░░░░░░░░░░░░░░░░░░ 0.000

Score

Model response

How to read

Score ∈ [0, 1] — higher = higher fabrication risk.
Threshold 0.684 (calibrated cross-bench).
🟢 < 0.4 → low risk · 🟡 0.4 - 0.684 → moderate · 🔴 > 0.684 → flag.

Honest scope

Works for fabrication-style hallucinations in factual QA. Less effective on misconception resistance (TruthfulQA-style) or knowledge-gap MC (MMLU). See the reproducer notebook for the four-benchmark evaluation.

Reproducibility

Every number was generated by 31_hallucinationguard_v2_linear_probe.ipynb. Run it yourself in Colab Pro+ in ~50 minutes for ~R$10 in credits.