Skip to content

Website fixes: Perturbation significance asterisks; scope domain toggle to scatter plot#116

Merged
tara-servicenow merged 3 commits into
mainfrom
fix/perturbation-chart
May 14, 2026
Merged

Website fixes: Perturbation significance asterisks; scope domain toggle to scatter plot#116
tara-servicenow merged 3 commits into
mainfrom
fix/perturbation-chart

Conversation

@lindsaydbrin
Copy link
Copy Markdown
Collaborator

@lindsaydbrin lindsaydbrin commented May 14, 2026

Summary

Two related fixes to the Perturbations section of the leaderboard:

  1. Scope the domain toggle to the scatter plot only. We'd intended to show pooled and not per-domain plots. <Perturbations> and <PerturbationBarChart> no longer take a domain prop and now call getPertValue with 'pooled'.
  2. Fix significance asterisk placement and size. StarMark now places markers just outside the CI cap in the bar's direction - above for positive deltas, below for negative - mirroring the implementation on the stats branch / for the paper. Font size now scales with vb.width (clamped to [7, 13]) so *** always fits inside the bar. Asterisk y is also clamped inside the plot pixel range so it stays visible even if a CI cap is outside the [-0.5, 0.5] axis (which doesn't happen with current pooled data, but just in case).

Build artifacts under docs/ are regenerated.

Follow-up:

  • TL;DR Fixed a bug where zero values on the plots were ignored for indexing and asterisks were then positioned incorrectly.
  • Details from Claude: Also fixes an index-drift bug where recharts' Bar drops zero-dimension rectangles before building the LabelList context, causing the content callback's index to point into a filtered array. Switched to valueAccessor so we read each row's data straight off the recharts entry.

Test plan

  • npm run build — clean (vite build, tsc -b)
  • npx eslint on touched files — clean
  • npm run dev — visually verified asterisks are attached to their bars (above for positive, below for negative) and shrink to fit narrow bars
  • Toggling Pooled / CSM / ITSM / HR re-renders the scatter plot only; Perturbations section stays put

…sk placement

Perturbation results are reported pooled across domains, so the domain
pills at the top of the leaderboard now only re-render the scatter plot;
<Perturbations> and <PerturbationBarChart> no longer take a domain prop
and call getPertValue with 'pooled'.

Rewrite StarMark so significance markers sit just outside the CI cap in
the bar's direction — above for positive deltas, below for negative —
mirroring analysis/eva-bench-stats/plots_perturbations.py. The previous
Math.max(0, upperValue) clamp parked markers near the zero line for
deeply-negative bars, visually disconnecting them from their bars.

Font size now scales with bar width (clamped 7..13) so '***' always fits
inside the bar; the asterisk y is clamped inside the plot pixel range
so it stays visible if a CI cap is outside the [-0.5, 0.5] axis.

Includes the regenerated docs/ build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lindsaydbrin
Copy link
Copy Markdown
Collaborator Author

Screenshot 2026-05-14 at 11 47 59 AM Screenshot 2026-05-14 at 11 44 17 AM

…anup

PerturbationBarChart's LabelList content was looking up the row via
data[cp.index]. Recharts' Bar drops zero-dimension rectangles before
building the LabelList context (Bar.js:665), so cp.index is into the
filtered array. With one model whose pooled perturbation deltas are
exactly 0 (EVA-X_pass for Gemini 3 Flash + Gemini 3.1 Flash TTS), every
row after it had its asterisks placed using the wrong row's point/err —
visible as asterisks on the wrong side of the zero line for downstream
negative-delta bars (most clearly Whisper Large v3 on EVA-X pass@1).

Switch from dataKey + cp.index to valueAccessor, which receives the
recharts entry with payload pointing at the original unfiltered row.
The encoded "label|point|errLo|errHi" string travels with the bar's own
viewBox, so placement stays correct regardless of filtering.

Also drop a dead `${p}_sig` field (set but never read) and refresh a
stale comment that referenced ReferenceLine for the group separators
(they moved to Customized + SeparatorsLayer in an earlier commit).
Regenerate docs/ build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tara-servicenow
Copy link
Copy Markdown
Collaborator

I think you need to pull main and rebuild since you have some merge conflicts

# Conflicts:
#	docs/assets/index-B1JUNaN9.js
#	docs/assets/index-DS9XY9eK.js
#	docs/assets/index-ghBLaX6r.js
#	docs/index.html
@tara-servicenow tara-servicenow added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit 5ca7a77 May 14, 2026
1 check passed
@tara-servicenow tara-servicenow deleted the fix/perturbation-chart branch May 14, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants