Skip to content

jmm2020/hallucination-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude Code Guardrails

Three agentic hooks that keep Claude Code honest — catching hallucination, intent drift, and context exhaustion in real time.

LLMs hallucinate, drift from goals, and degrade as context fills up. These hooks detect all three failure modes and inject warnings directly into the conversation so Claude self-corrects before wasting compute.


The Problem

Claude Code is powerful, but it fails in predictable ways:

  • Invents file paths that don't exist, then tries to read them repeatedly
  • Drifts from your goal -- you ask for a bug fix, Claude starts refactoring six files
  • Loops on the same failed action 3, 4, 5+ times expecting different results
  • Gets worse as context fills -- the more tokens consumed, the higher the hallucination rate

By the time you notice, Claude has wasted 10+ tool calls on phantom files, wandered into scope creep, and the context window is polluted.

The Solution

Three complementary hooks that form a layered defense:


1. Intent Drift Detector (hooks/intent_drift_detector.py) -- AGENTIC

The headline hook. This is genuinely agentic: an AI watching another AI for drift.

Two-phase operation:

  • Phase 1 (UserPromptSubmit): Captures the user's prompt and calls a local LLM to extract a concise intent summary
  • Phase 2 (PostToolUse): Every 4th tool call, sends the intent + recent actions to the LLM to judge alignment

The LLM returns one of three verdicts:

Verdict Meaning Hook Response
ALIGNED Actions serve the goal No warning
DRIFTING Tangentially related, losing focus Warning: re-read the original request
OFF_TRACK Actions unrelated to the goal Alert: STOP and course-correct

Example warning:

WARNING INTENT DRIFT
  Goal: "Fix the authentication bug in the login endpoint"
  Status: DRIFTING — Recent actions focus on refactoring logging configuration,
    not the auth endpoint.
  Recent actions: 12 tool calls since last prompt

  ACTION: Re-read the user's original request before your next action.
  Ask yourself: does what I'm about to do directly serve the goal?

How it works:

  • Calls a local LLM (llama.cpp, Ollama, or any OpenAI-compatible endpoint) for both intent extraction and drift judgment
  • Configurable endpoint via INTENT_DRIFT_LLM_URL (defaults to http://localhost:8080)
  • Only checks every 4th tool call to minimize LLM overhead
  • 3-minute cooldown between warnings
  • Resets intent tracking on each new user prompt (short prompts like "yes" or "ok" are skipped)
  • Per-session state in /tmp/ survives across hook invocations
  • Pure Python stdlib -- no pip dependencies (uses urllib.request for LLM calls)

Without a local LLM: The hook silently degrades -- if the LLM endpoint is unreachable, it skips the check and returns {"continue": true}. No errors, no noise. You can run the other two hooks standalone.


2. Hallucination Detector (hooks/hallucination_detector.py)

A PostToolUse hook that tracks tool failure patterns across a sliding window to detect three hallucination signals:

Signal What it detects Threshold
Phantom files Read/Glob/Grep failures spike -- Claude is inventing paths 50%+ failure rate over last 12 calls
Action loops Same tool + same args repeated consecutively 3+ identical calls in a row
Drift zone High failure rate + high token usage = hallucination territory Failures + >65% context used

Example output:

HALLUCINATION RISK DETECTED
  Phantom files: 4/6 recent file operations failed (67%). Paths may be invented.
    /src/utils/nonexistent_helper.py
    /lib/config/phantom_module.ts
  Action loop: 'Read:/src/missing.py' repeated 3x. Stuck in a retry loop.

  ACTION: Verify claims against actual tool output. Do not trust file paths or
  function names from memory -- re-read the source before referencing it.

In drift zone (failures + high token pressure):

  DRIFT ZONE: High token usage + failures = likely hallucinating.
  ACTION: Run /compact or restart the session. Do NOT continue --
  outputs are unreliable. Save important state first.

How it works:

  • Maintains a per-session sliding window of the last 12 tool calls in /tmp/
  • Detects both hard failures (tool errors) and soft failures ("no such file", "not found", "no matches")
  • Reads actual token counts from the Claude Code transcript JSONL
  • 2-minute cooldown between warnings
  • Zero dependencies -- pure Python stdlib

3. Context Window Monitor (hooks/context_window_monitor.py)

A companion PostToolUse hook that tracks raw token consumption and warns before you hit the wall:

Level Threshold Action
Warning 225K tokens (configurable) "Start wrapping up -- save state and prepare for session restart"
Critical 256K tokens (configurable) "STOP and restart NOW. Save important state first."

How it works:

  • Reads real API usage from the session transcript (not estimated -- actual cache_read_input_tokens + input_tokens + cache_creation_input_tokens)
  • Thresholds configurable via environment variables
  • Efficient: reads only the last 50KB of the transcript file

How They Work Together

User sends prompt
  └─ Intent drift detector captures goal via LLM (Phase 1)

Claude works...
  ├─ Tool calls succeed → no warnings
  ├─ Tool calls fail → hallucination detector tracks failure rate
  ├─ Every 4th tool call → intent drift detector judges alignment via LLM (Phase 2)
  └─ Every tool call → context monitor checks token count

Failure cascades:
  Context fills → context monitor warns → "wrap up soon"
  Failures spike → hallucination detector → "verify your claims"
  Failures + tokens → "DRIFT ZONE — restart session"
  Actions diverge → intent drift detector → "you've wandered from the goal"

Three layers, three failure modes, one defense system.

Installation

Quick Setup

# Copy hooks
mkdir -p .claude/hooks
cp hooks/hallucination_detector.py .claude/hooks/
cp hooks/context_window_monitor.py .claude/hooks/
cp hooks/intent_drift_detector.py .claude/hooks/

Add to .claude/settings.json:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "type": "command",
        "command": "python3 .claude/hooks/intent_drift_detector.py",
        "timeout": 12000
      }
    ],
    "PostToolUse": [
      {
        "type": "command",
        "command": "python3 .claude/hooks/hallucination_detector.py",
        "timeout": 5000
      },
      {
        "type": "command",
        "command": "python3 .claude/hooks/context_window_monitor.py",
        "timeout": 3000
      },
      {
        "type": "command",
        "command": "python3 .claude/hooks/intent_drift_detector.py",
        "timeout": 12000
      }
    ]
  }
}

Note: intent_drift_detector.py appears twice -- it handles both UserPromptSubmit (captures goal) and PostToolUse (judges alignment). Same file, different behavior based on the hook event.

Global Setup (all projects)

mkdir -p ~/.claude/hooks
cp hooks/*.py ~/.claude/hooks/
# Add the above config to ~/.claude/settings.json

Without a Local LLM

The intent drift detector requires a local LLM endpoint for judging alignment. Without one, it silently degrades (no errors, no warnings). The other two hooks work independently with zero dependencies.

To set up a local LLM:

  • llama.cpp: ./llama-server -m model.gguf --port 8080
  • Ollama: Set INTENT_DRIFT_LLM_URL=http://localhost:11434
  • Any OpenAI-compatible API: Set INTENT_DRIFT_LLM_URL to your endpoint

Configuration

Intent Drift Detector

Environment Variable Default Description
INTENT_DRIFT_LLM_URL http://localhost:8080 Local LLM endpoint (OpenAI-compatible)
INTENT_DRIFT_LLM_KEY ucis-internal API key for the LLM
INTENT_DRIFT_COOLDOWN 180 Seconds between drift warnings
INTENT_DRIFT_WINDOW 8 Recent tool calls sent to LLM for judgment
INTENT_DRIFT_MIN_TOOLS 5 Minimum tool calls before first check
INTENT_DRIFT_TIMEOUT 10 LLM call timeout in seconds

Hallucination Detector

Variable Default Description
WINDOW_SIZE 12 Number of recent tool calls to analyze
FAILURE_THRESHOLD 0.5 Failure rate that triggers warning (50%)
LOOP_THRESHOLD 3 Consecutive identical calls before loop detection
TOKEN_PRESSURE_PCT 0.65 Context usage % that triggers drift zone
COOLDOWN_SECONDS 120 Minimum seconds between warnings

Context Window Monitor

Environment Variable Default Description
CONTEXT_WARN_THRESHOLD 225000 Token count for early warning
CONTEXT_CRITICAL_THRESHOLD 256000 Token count for critical/stop warning

Origin

These hooks were built for UCIS (Unified Consciousness Integration System), a domain-separated AI consciousness architecture running 8 autonomous agents, 18 MCP servers, and 26,000+ memories across three graph databases. In that environment, Claude Code sessions routinely hit context limits during deep architecture sweeps, hallucination-induced phantom file loops wasted significant compute, and long agentic tasks would drift from the original goal. These hooks eliminated all three problems.

Requirements

  • Python 3.10+
  • Claude Code CLI
  • No pip dependencies (pure stdlib)
  • Optional: local LLM endpoint for intent drift detection (llama.cpp, Ollama, or any OpenAI-compatible API)

License

MIT

About

Claude Code hooks that detect AI hallucination in real time — phantom files, action loops, drift zone detection + context window monitoring

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages