๐ก๏ธ FabricationGuard โ live demo
Real Qwen3.6-27B inference + probe. No mocks. Type any prompt and watch the activation-probe score the prompt for fabrication risk in real time.
Built by OpenInterp ยท pip install openinterp ยท
GitHub ยท
Probe artifact
โ ๏ธ Cold start may take 3-5 minutes on first request (model load). Subsequent requests are fast (5-15s including model generation).
| Prompt | Mode |
|---|
๐ข โโโโโโโโโโโโโโโโโโโโ 0.000
How to read
- Score โ [0, 1] โ higher = higher fabrication risk.
- Threshold 0.684 (calibrated cross-bench).
- ๐ข < 0.4 โ low risk ยท ๐ก 0.4 - 0.684 โ moderate ยท ๐ด > 0.684 โ flag.
Honest scope
Works for fabrication-style hallucinations in factual QA. Less effective on misconception resistance (TruthfulQA-style) or knowledge-gap MC (MMLU). See the reproducer notebook for the four-benchmark evaluation.
Reproducibility
Every number was generated by 31_hallucinationguard_v2_linear_probe.ipynb.
Run it yourself in Colab Pro+ in ~50 minutes for ~R$10 in credits.