Website fixes: Perturbation significance asterisks; scope domain toggle to scatter plot#116
Merged
Merged
Conversation
…sk placement Perturbation results are reported pooled across domains, so the domain pills at the top of the leaderboard now only re-render the scatter plot; <Perturbations> and <PerturbationBarChart> no longer take a domain prop and call getPertValue with 'pooled'. Rewrite StarMark so significance markers sit just outside the CI cap in the bar's direction — above for positive deltas, below for negative — mirroring analysis/eva-bench-stats/plots_perturbations.py. The previous Math.max(0, upperValue) clamp parked markers near the zero line for deeply-negative bars, visually disconnecting them from their bars. Font size now scales with bar width (clamped 7..13) so '***' always fits inside the bar; the asterisk y is clamped inside the plot pixel range so it stays visible if a CI cap is outside the [-0.5, 0.5] axis. Includes the regenerated docs/ build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
Author
…anup
PerturbationBarChart's LabelList content was looking up the row via
data[cp.index]. Recharts' Bar drops zero-dimension rectangles before
building the LabelList context (Bar.js:665), so cp.index is into the
filtered array. With one model whose pooled perturbation deltas are
exactly 0 (EVA-X_pass for Gemini 3 Flash + Gemini 3.1 Flash TTS), every
row after it had its asterisks placed using the wrong row's point/err —
visible as asterisks on the wrong side of the zero line for downstream
negative-delta bars (most clearly Whisper Large v3 on EVA-X pass@1).
Switch from dataKey + cp.index to valueAccessor, which receives the
recharts entry with payload pointing at the original unfiltered row.
The encoded "label|point|errLo|errHi" string travels with the bar's own
viewBox, so placement stays correct regardless of filtering.
Also drop a dead `${p}_sig` field (set but never read) and refresh a
stale comment that referenced ReferenceLine for the group separators
(they moved to Customized + SeparatorsLayer in an earlier commit).
Regenerate docs/ build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
|
I think you need to pull main and rebuild since you have some merge conflicts |
# Conflicts: # docs/assets/index-B1JUNaN9.js # docs/assets/index-DS9XY9eK.js # docs/assets/index-ghBLaX6r.js # docs/index.html
tara-servicenow
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Two related fixes to the Perturbations section of the leaderboard:
<Perturbations>and<PerturbationBarChart>no longer take adomainprop and now callgetPertValuewith'pooled'.StarMarknow places markers just outside the CI cap in the bar's direction - above for positive deltas, below for negative - mirroring the implementation on the stats branch / for the paper. Font size now scales withvb.width(clamped to [7, 13]) so***always fits inside the bar. Asteriskyis also clamped inside the plot pixel range so it stays visible even if a CI cap is outside the [-0.5, 0.5] axis (which doesn't happen with current pooled data, but just in case).Build artifacts under
docs/are regenerated.Follow-up:
Test plan
npm run build— clean (vite build, tsc -b)npx eslinton touched files — cleannpm run dev— visually verified asterisks are attached to their bars (above for positive, below for negative) and shrink to fit narrow bars