๐Ÿ›ก๏ธ FabricationGuard โ€” live demo

Real Qwen3.6-27B inference + probe. No mocks. Type any prompt and watch the activation-probe score the prompt for fabrication risk in real time.

Built by OpenInterp ยท pip install openinterp ยท GitHub ยท Probe artifact

โš ๏ธ Cold start may take 3-5 minutes on first request (model load). Subsequent requests are fast (5-15s including model generation).

Mode

detect = score only ยท warn = flag ยท abstain = replace high-score with uncertainty

Examples
Prompt Mode

๐ŸŸข โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0.000


How to read

  • Score โˆˆ [0, 1] โ€” higher = higher fabrication risk.
  • Threshold 0.684 (calibrated cross-bench).
  • ๐ŸŸข < 0.4 โ†’ low risk ยท ๐ŸŸก 0.4 - 0.684 โ†’ moderate ยท ๐Ÿ”ด > 0.684 โ†’ flag.

Honest scope

Works for fabrication-style hallucinations in factual QA. Less effective on misconception resistance (TruthfulQA-style) or knowledge-gap MC (MMLU). See the reproducer notebook for the four-benchmark evaluation.

Reproducibility

Every number was generated by 31_hallucinationguard_v2_linear_probe.ipynb. Run it yourself in Colab Pro+ in ~50 minutes for ~R$10 in credits.