Skip to content

gfql/passes: fix \\b regex bug + DRY consolidation in predicate_pushdown.py (retroactive audit of #1188) #1195

@lmeyerov

Description

@lmeyerov

Context

Retroactive audit of merged PR #1188 (predicate pushdown pass rewrite, commit 23fb3a9) surfaced two issues missed in review.

Bug: `\\b` inside rf-string at `predicate_pushdown.py:178`

```python
if re.search(rf"\\b{re.escape(alias)}\\b", segment)
```

Inside an rf-string, `\\b` is a literal backslash followed by `b`, not a word boundary. Compare to the correct form at line 211 of the same file:

```python
if re.search(rf"\b{re.escape(alias)}\b", expression)
```

Effect: `_refs_for_segment` never matches an alias; the conservative fallback at line 183 (`return original_refs`) always fires. Per-conjunct reference narrowing after an AND split is effectively dead code. Tests pass because the fallback is a superset, hiding the bug.

DRY: Duplicate AND splitter

`predicate_pushdown.py:129` defines `_split_top_level_and` — a quote/paren-aware top-level AND splitter. The same package already has `_split_top_level_and_terms` in `cypher/parser.py:477`, which is strictly more robust: handles `[]`, `{}`, backticks, and escape sequences.

This is the exact same class of miss that PR #1193 fixed during Wave 3 review (see issue #1125 follow-up). Two near-duplicate implementations of "split WHERE body on top-level AND" in one package.

Proposed fixes

  1. Correct the `\\b` regex to `\b`. Add a targeted regression test: a multi-alias OPTIONAL MATCH expression where at least one alias must be detected per-segment (i.e., the fallback-to-`original_refs` path would incorrectly widen safety guards).
  2. Consolidate the two splitters: promote `_split_top_level_and_terms` to a shared module (e.g., `graphistry/compute/gfql/common/expr_split.py`) and delete `_split_top_level_and` from `predicate_pushdown.py`.

Missed-in-review gap

The earlier review of #1188 used the builtin `/review` tool. The retroactive audit ran the expanded multi-dimension review skill and caught both. Process change: for gfql/cypher PRs, default to the multi-dimension skill.

Priority

  • Bug: p2 (silent semantic widening; not load-bearing for correctness under current callers, but could become so)
  • DRY: p3 (cleanup, unblocks deletion of the duplicate splitter)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions