Skip to content

Fix reproject.merge same-CRS dask path materializing full source per chunk#1576

Open
brendancol wants to merge 1 commit intomainfrom
deep-sweep-performance-reproject-2026-05-10
Open

Fix reproject.merge same-CRS dask path materializing full source per chunk#1576
brendancol wants to merge 1 commit intomainfrom
deep-sweep-performance-reproject-2026-05-10

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

_merge_block_adapter called src_data.compute() on the full source
dask array for every output chunk on the same-CRS direct-placement
path. Measured: 256x256 source with 32x32 output chunks materialized
8.9M pixels for 131K source pixels (68x amplification). For an
8192x8192 source with 256x256 output chunks, the amplification would
push driver-side data flow into terabyte territory.

The fix adds _place_same_crs_lazy (xrspatial/reproject/init.py)
which mirrors _place_same_crs but slices the source window first and
calls .compute() only on that slice. Post-fix amplification is 1.00x.
Mirrors the slicing pattern already in _reproject_chunk_numpy and
_reproject_chunk_cupy.

Surfaced by the 2026-05-10 reproject performance sweep.

Closes #1571.

Test plan

  • New regression test test_merge_dask_same_crs_bounded_materialization traces da.Array.compute() calls and asserts total materialized pixels stay under 3x total source size.
  • Existing test_merge_dask_same_crs_matches_eager continues to pass (bit-equal placement preserved).
  • Full reproject test suite passes (194 tests).
  • Direct measurement: pre-fix 68.00x, post-fix 1.00x ratio of materialized pixels over total source size.

…chunk

`_merge_block_adapter` called `src_data.compute()` on the full source
dask array for every output chunk on the same-CRS direct-placement path.
For a 256x256 source split into 32x32 output chunks, this materialized
8.9M pixels for 131K source pixels (68x amplification). At realistic
8192x8192 source with 256x256 output chunks the amplification would push
driver-side data flow into terabyte territory.

The fix adds `_place_same_crs_lazy` that mirrors `_place_same_crs` but
slices the source window first and only calls `.compute()` on that
slice. Post-fix ratio is 1.00x (131K pixels materialized for 131K
source). Mirrors the slicing pattern already in
`_reproject_chunk_numpy` and `_reproject_chunk_cupy`.

Adds regression test `test_merge_dask_same_crs_bounded_materialization`
that traces `da.Array.compute()` calls and asserts total materialized
pixels stay under 3x total source size.

Surfaced by the 2026-05-10 reproject performance sweep.
Closes #1571.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject.merge: same-CRS dask path materializes full source per output chunk

1 participant