1 product-observability-user-stories
Jochem de Boer edited this page 2026-04-13 14:33:49 +00:00

Observability User Stories

Research into what users need to understand about claude-permit's runtime behavior. These user stories inform the observability features on the roadmap.

Date: 2026-04-09

Context

claude-permit works silently by design — it makes hundreds of decisions per session, but the user sees almost nothing beyond the occasional permission dialog. Everything is captured in audit.jsonl, but that's a raw JSONL file requiring jq to query. These user stories capture what users actually want to know.


Priority Tier 1 — High Value

US-1: Session Scorecard

As a user, I want to see how many permission prompts claude-permit saved me, so I know if it's worth keeping installed.

Example: "You were auto-approved 342 times today, asked 4 times, blocked 2."

US-2: Session Summary

As a user, I want to see a session summary when a Claude Code session ends — a quick scorecard of what happened: total tool calls, auto-approved, prompted, denied.

US-4: Understand Denials

As a user, when a tool call is blocked, I want to understand why — which rule matched, what the tool was trying to do, and how to override it if it's a false positive.

US-5: See LLM Reasoning

As a user, when the LLM makes a decision, I want to see its reasoning — not buried in a log file, but surfaced at the moment it matters (especially for YELLOW/RED).

US-6: Auto-Promote Notifications

As a user, I want to know when a rule was auto-promoted — something just got permanently added to my allow list; I should be aware of that.

US-10: Rule Hit Counts

As a user, I want a dashboard of my rules and how often each fires — which rules are doing heavy lifting, which are dead weight.

US-18: Health Check

As a user, I want to know if claude-permit is even running — a simple health check or status indicator confirming hooks are wired up and the binary is responding.


Priority Tier 2 — Important

As a user, I want to see cumulative stats over time — am I getting fewer prompts as auto-rules accumulate? Is the system learning?

US-7: LLM Health

As a user, I want to know if the LLM is slow or failing — if LLM calls are timing out or erroring, I'm getting error_passthrough (fail-open) and losing the safety net without knowing it.

As a user, I want to see LLM latency trends — is Haiku adding 200ms or 2000ms to my workflow? Is it getting worse?

US-9: Passthrough Alerts

As a user, I want to be alerted if a suspiciously large number of operations are passing through unmatched — that might mean my rules are stale or misconfigured.

US-11: Auto-Rule Provenance

As a user, I want to see which auto-promoted rules exist and how they got there — with the original LLM reasoning, so I can decide if I trust them.

US-14: Frequent Commands/Paths

As a user, I want to see which commands/paths are most frequent — hot spots in my workflow that might deserve explicit rules.


Priority Tier 3 — Nice to Have

US-12: Near Misses

As a user, I want to see "near misses" — operations that almost matched a deny rule, or that the LLM scored borderline YELLOW/GREEN.

US-13: Tool Type Breakdown

As a user, I want a breakdown by tool type — how many Bash calls vs. Read vs. Write? Are certain tools dominating?

US-15: Compare Sessions

As a user, I want to compare sessions — did this session behave differently from yesterday's? More denials? Different tools?

US-16: Security Digest

As a user, I want a periodic security digest — weekly summary of all RED/YELLOW decisions, auto-promotions, and anything unusual.

US-17: Trust Timeline

As a user, I want to see a "trust timeline" — how my rule set has evolved over time as auto-promotion adds rules.


  • Data Capture Analysis — maps these stories to available data and identifies gaps
  • Issues tracking implementation: see labels observability + enhancement