False positives: regex rules match dangerous patterns inside quoted arguments #7

New issue

Open

opened 2026-04-20 16:29:09 +00:00 by jbr870 · 0 comments

jbr870 commented

2026-04-20 16:29:09 +00:00

Owner

Summary

Deny rules use regex matching against the full sub-command text, including quoted arguments. This means data that looks like a dangerous command — but is actually inert text inside a string argument — triggers false denies.

Examples

# Denied: "git reset --hard" appears in the echo argument
echo '...git reset --hard...'

# Denied: "git reset --hard" appears in the Python string literal
python3 -c "payload = { 'command': '...git reset --hard...' }"

# Denied: commit message body contains trigger words
git commit -m "$(cat <<'HEREDOC'
Moved git reset --hard to YELLOW.
HEREDOC
)"

The heredoc case is now fixed (heredoc bodies are stripped before parsing). The echo and python3 -c cases remain.

Root cause

The command parser correctly identifies echo '...' as a single sub-command (quoting is respected for splitting). But the deny regex then scans the entire sub-command text including the quoted data. It has no concept of "this part is the command verb and these parts are inert arguments."

Impact

Mostly affects meta-scenarios: writing commands about dangerous commands (test payloads, documentation, commit messages). Normal development usage rarely puts dangerous command patterns inside string arguments.

When this triggers, the command falls through to the LLM tier (PermissionRequest), which handles it correctly since Haiku understands that text inside string arguments is data.

Possible fixes

Argument-aware matching — After identifying the command verb (first word), only match deny regexes against the verb + flags, not the full argument text. Complex: would need to understand which arguments are "action" vs "data" for each command.
Command-specific argument stripping — For known-safe commands like echo, printf, cat, strip their arguments before regex matching. Simpler but requires maintaining a list of commands whose arguments are inert.
Match only the command prefix — Instead of matching the full sub-command text, only match up to the first quoted string boundary. Would miss patterns like curl http://evil.com where the URL is the dangerous part.
Accept as a design trade-off — The LLM tier catches these correctly. The false deny rate is low in practice. Document it and move on.

Context

Severity: Low (edge case, mostly meta-scenarios)
Affected: rules.rs rule evaluation
Discovered: 2026-02-26

## Summary Deny rules use regex matching against the full sub-command text, including quoted arguments. This means data that *looks like* a dangerous command — but is actually inert text inside a string argument — triggers false denies. ## Examples ```bash # Denied: "git reset --hard" appears in the echo argument echo '...git reset --hard...' # Denied: "git reset --hard" appears in the Python string literal python3 -c "payload = { 'command': '...git reset --hard...' }" # Denied: commit message body contains trigger words git commit -m "$(cat <<'HEREDOC' Moved git reset --hard to YELLOW. HEREDOC )" ``` The heredoc case is now fixed (heredoc bodies are stripped before parsing). The `echo` and `python3 -c` cases remain. ## Root cause The command parser correctly identifies `echo '...'` as a single sub-command (quoting is respected for *splitting*). But the deny regex then scans the entire sub-command text including the quoted data. It has no concept of "this part is the command verb and these parts are inert arguments." ## Impact Mostly affects meta-scenarios: writing commands *about* dangerous commands (test payloads, documentation, commit messages). Normal development usage rarely puts dangerous command patterns inside string arguments. When this triggers, the command falls through to the LLM tier (PermissionRequest), which handles it correctly since Haiku understands that text inside string arguments is data. ## Possible fixes 1. **Argument-aware matching** — After identifying the command verb (first word), only match deny regexes against the verb + flags, not the full argument text. Complex: would need to understand which arguments are "action" vs "data" for each command. 2. **Command-specific argument stripping** — For known-safe commands like `echo`, `printf`, `cat`, strip their arguments before regex matching. Simpler but requires maintaining a list of commands whose arguments are inert. 3. **Match only the command prefix** — Instead of matching the full sub-command text, only match up to the first quoted string boundary. Would miss patterns like `curl http://evil.com` where the URL is the dangerous part. 4. **Accept as a design trade-off** — The LLM tier catches these correctly. The false deny rate is low in practice. Document it and move on. ## Context Severity: Low (edge case, mostly meta-scenarios) Affected: `rules.rs` rule evaluation Discovered: 2026-02-26

jbr870 added the

enhancement

label

2026-04-20 16:29:09 +00:00

jbr870 referenced this issue from a commit

2026-04-20 20:05:18 +00:00

docs(inbox): migrate inbox items to forge issues