Friction insights in HTML report: surface repeat-prompt patterns + suggest fixes #20

New issue

Open

opened 2026-04-27 12:59:39 +00:00 by jbr870 · 0 comments

jbr870 commented

2026-04-27 12:59:39 +00:00

Owner

Summary

Add a "friction insights" view to claude-permit report --html that surfaces patterns where the user keeps being asked to approve the same (or similar) tool calls — and flags candidates for promotion, denial, or system-bug investigation.

Origin

This issue exists because of a real incident: a bug prevented auto-rules.toml from loading, which silently caused the user to re-approve the same kinds of permissions over and over. The bug was only caught because the friction became unbearable. A litmus test for this issue: would this report have surfaced that bug earlier? That's an acceptance criterion below, not a nice-to-have.

The current report (#10) shows the rules and config that exist. It does not show which patterns the user actually struggles with. This issue closes that gap.

User Stories

As a user, I want to see which tool-call patterns I keep approving manually, so I can decide whether to promote them to allow rules.
As a user, I want to see which patterns the system should be auto-promoting but isn't, so I can catch silent system bugs (like the auto-rules-not-loading incident).
As a user, I want recency-weighted signals — "this happened 50 times last week" matters more than "this happened twice last month."
As a user, I want concrete, actionable suggestions — ideally the exact rule TOML to paste — not just a list of patterns.

Proposed v1 View

A new "Friction" section in the HTML report: one ranked, sortable table.

Columns:

Pattern — the tool + tool_input shape (see grouping below)
Hits 7d / 30d / total — three counts
Last seen — timestamp
Outcome breakdown — % auto-approved / % user-approved / % denied
Covered by — name of matching allow rule (or "—" if uncovered)
Flags — see below

Default sort: hits in 7d, descending.

Heuristic Flags (always on, no LLM)

These are deterministic and run on every report generation:

Flag	Trigger	Meaning
`📈 promote candidate`	hits ≥ N AND no covering allow rule AND user-approval rate ≥ 90%	"you keep saying yes to this — promote it"
`🛑 deny candidate`	hits ≥ N AND user-denial rate ≥ 80%	"you keep saying no — add a deny rule"
`⚠️ stuck`	LLM returned GREEN ≥ N times AND no entry in `auto-rules.toml` matches	the auto-rules-not-loading bug detector
`🔁 over-asked`	hits ≥ N over 7d AND covered by an `ask` rule	"this is paying the dialog tax — should it be allow?"

Thresholds (N) configurable via config.toml, with sensible defaults.

LLM-Augmented Suggestions (optional, behind `--llm` flag)

When claude-permit report --html --llm is used, the report additionally invokes claude --print --model haiku once with the friction list as context. The LLM produces, per flagged row:

A concrete rule suggestion (TOML snippet ready to paste into config.toml)
A short rationale ("why this rule and not a broader one")
Risk notes ("but this would also match X, which you may not want")

The report falls back gracefully if the LLM call fails or --llm isn't passed — heuristic flags still work.

Pattern Grouping (v1)

Group by (tool_name, exact tool_input string). Safest, slightly noisy.
Defer fuzzy grouping (token-prefix, path-prefix) until noise is empirically shown to be a problem.
The flag ⚠️ stuck should match by would-this-rule-have-fired logic against auto-rules.toml, not by string equality, so cosmetic differences in inputs don't hide the bug.

Acceptance Criteria

Friction table renders in the HTML report with all columns above.
All four heuristic flags work and are unit-tested.
The ⚠️ stuck flag would have flagged the original auto-rules-loading bug (replay an audit log from that period; the bug-affected patterns must surface). If we can't replay, hand-construct an audit fixture that mimics the failure mode.
With --llm, at least one row gets a concrete TOML suggestion in the rendered output.
Without --llm, the report renders fully and deterministically (snapshot-testable).
Configurable thresholds via config.toml; defaults documented.

Out of Scope

Editing rules from the UI (covered by #15)
Live updates / server mode (covered by #16)
Theming (covered by #17)
The general "explain things" copy work (covered by #14, but the friction section will need its own intro)
Notification/alerting on flags (CLI/email/slack) — separate concern

Relationship to Other Issues

#1 (stats subcommand → folded into report): kept as a sibling, not superseded. #1 owns the pure descriptive stats (latency, hit counts, totals, trends) — neutral data. This issue owns the opinionated friction view (flags, candidates, suggestions). Both render into the same HTML report as separate sections.
#2 (named rules) — would make the "Covered by" column meaningfully readable; not a blocker, but a clear quality boost.
#5 (surface LLM reasoning in hook output) — same data, different surface (live dialog vs. report). Friction-list rows could expand to show the LLM's most recent reasoning for that pattern.
#14 (explanations) — the friction section will need a clear intro; coordinate.
#18 (YELLOW: propose alternative) — once shipped, friction-list rows can show "last suggested safer rewrite" alongside the heuristic/LLM suggestions.

Open Questions

Are the default thresholds (N) hardcoded ship-fast values, or do we want a calibration step (e.g. "based on your last 30d of audit, here's what we recommend")?
Should the LLM suggestion pass also receive policy.md so it doesn't propose rules that violate the safety policy?
Should auto-rules.toml carry last_matched_at metadata so we can also flag dormant auto-rules in this same view? (Adjacent to #3 — "enrich auto-rules.toml with provenance metadata.")

## Summary Add a "friction insights" view to `claude-permit report --html` that surfaces patterns where the user keeps being asked to approve the same (or similar) tool calls — and flags candidates for promotion, denial, or system-bug investigation. ## Origin This issue exists because of a real incident: a bug prevented `auto-rules.toml` from loading, which silently caused the user to re-approve the same kinds of permissions over and over. The bug was only caught because the friction became unbearable. **A litmus test for this issue: would this report have surfaced that bug earlier?** That's an acceptance criterion below, not a nice-to-have. The current report (#10) shows the *rules and config that exist*. It does not show *which patterns the user actually struggles with*. This issue closes that gap. ## User Stories - As a user, I want to see which tool-call patterns I keep approving manually, so I can decide whether to promote them to allow rules. - As a user, I want to see which patterns the system *should* be auto-promoting but isn't, so I can catch silent system bugs (like the auto-rules-not-loading incident). - As a user, I want recency-weighted signals — "this happened 50 times last week" matters more than "this happened twice last month." - As a user, I want concrete, actionable suggestions — ideally the exact rule TOML to paste — not just a list of patterns. ## Proposed v1 View A new "Friction" section in the HTML report: one ranked, sortable table. Columns: - **Pattern** — the tool + tool_input shape (see grouping below) - **Hits 7d / 30d / total** — three counts - **Last seen** — timestamp - **Outcome breakdown** — % auto-approved / % user-approved / % denied - **Covered by** — name of matching allow rule (or "—" if uncovered) - **Flags** — see below Default sort: hits in 7d, descending. ## Heuristic Flags (always on, no LLM) These are deterministic and run on every report generation: | Flag | Trigger | Meaning | |---|---|---| | `📈 promote candidate` | hits ≥ N AND no covering allow rule AND user-approval rate ≥ 90% | "you keep saying yes to this — promote it" | | `🛑 deny candidate` | hits ≥ N AND user-denial rate ≥ 80% | "you keep saying no — add a deny rule" | | `⚠️ stuck` | LLM returned GREEN ≥ N times AND no entry in `auto-rules.toml` matches | **the auto-rules-not-loading bug detector** | | `🔁 over-asked` | hits ≥ N over 7d AND covered by an `ask` rule | "this is paying the dialog tax — should it be allow?" | Thresholds (`N`) configurable via `config.toml`, with sensible defaults. ## LLM-Augmented Suggestions (optional, behind `--llm` flag) When `claude-permit report --html --llm` is used, the report additionally invokes `claude --print --model haiku` once with the friction list as context. The LLM produces, per flagged row: - A **concrete rule suggestion** (TOML snippet ready to paste into `config.toml`) - A short rationale ("why this rule and not a broader one") - Risk notes ("but this would also match X, which you may not want") The report falls back gracefully if the LLM call fails or `--llm` isn't passed — heuristic flags still work. ## Pattern Grouping (v1) - Group by `(tool_name, exact tool_input string)`. Safest, slightly noisy. - Defer fuzzy grouping (token-prefix, path-prefix) until noise is empirically shown to be a problem. - The flag `⚠️ stuck` should match by *would-this-rule-have-fired* logic against `auto-rules.toml`, not by string equality, so cosmetic differences in inputs don't hide the bug. ## Acceptance Criteria 1. Friction table renders in the HTML report with all columns above. 2. All four heuristic flags work and are unit-tested. 3. **The `⚠️ stuck` flag would have flagged the original auto-rules-loading bug** (replay an audit log from that period; the bug-affected patterns must surface). If we can't replay, hand-construct an audit fixture that mimics the failure mode. 4. With `--llm`, at least one row gets a concrete TOML suggestion in the rendered output. 5. Without `--llm`, the report renders fully and deterministically (snapshot-testable). 6. Configurable thresholds via `config.toml`; defaults documented. ## Out of Scope - Editing rules from the UI (covered by #15) - Live updates / server mode (covered by #16) - Theming (covered by #17) - The general "explain things" copy work (covered by #14, but the friction section will need its own intro) - Notification/alerting on flags (CLI/email/slack) — separate concern ## Relationship to Other Issues - **#1** (`stats` subcommand → folded into report): kept as a sibling, not superseded. #1 owns the *pure descriptive* stats (latency, hit counts, totals, trends) — neutral data. This issue owns the *opinionated* friction view (flags, candidates, suggestions). Both render into the same HTML report as separate sections. - **#2** (named rules) — would make the "Covered by" column meaningfully readable; not a blocker, but a clear quality boost. - **#5** (surface LLM reasoning in hook output) — same data, different surface (live dialog vs. report). Friction-list rows could expand to show the LLM's most recent reasoning for that pattern. - **#14** (explanations) — the friction section will need a clear intro; coordinate. - **#18** (YELLOW: propose alternative) — once shipped, friction-list rows can show "last suggested safer rewrite" alongside the heuristic/LLM suggestions. ## Open Questions 1. Are the default thresholds (N) hardcoded ship-fast values, or do we want a calibration step (e.g. "based on your last 30d of audit, here's what we recommend")? 2. Should the LLM suggestion pass also receive `policy.md` so it doesn't propose rules that violate the safety policy? 3. Should `auto-rules.toml` carry `last_matched_at` metadata so we can also flag dormant auto-rules in this same view? (Adjacent to #3 — "enrich auto-rules.toml with provenance metadata.")

jbr870 added the

enhancement

observability

labels

2026-04-27 12:59:39 +00:00

jbr870 referenced this issue

2026-04-27 13:00:34 +00:00

Add stats subcommand #1