Fix balanced_allocation dask+cupy TypeError on source_ids#1569
Open
brendancol wants to merge 4 commits intomainfrom
Open
Fix balanced_allocation dask+cupy TypeError on source_ids#1569brendancol wants to merge 4 commits intomainfrom
brendancol wants to merge 4 commits intomainfrom
Conversation
Audited the one new commit since pass 10: - #1559 (769bcf7): centralise GeoTIFF attrs population across all read backends via a single _populate_attrs_from_geo_info helper. Pure metadata-handling refactor. The helper dedupes attrs population across the eager numpy, dask, and two GPU read paths so all four backends emit the same key set. No new allocations, no file I/O changes, no kernel changes, no dtype handling changes -- the helper only writes to an in-memory attrs dict from already-validated geo_info fields. Re-verified all prior guards intact: - MAX_PIXELS_DEFAULT=1e9 at _reader.py:46 - MAX_IFD_ENTRY_COUNT=100_000 at _header.py:34 - MAX_IFDS=256 at _header.py:45 - MAX_TILE_BYTES_DEFAULT=256 MiB at _reader.py:68 - _MAX_DASK_CHUNKS=50_000 at __init__.py:1428 - realpath canonicalisation at _reader.py:161 and _vrt.py:160 Cat 1-6 all clean. No PRs opened.
Re-audit after #1559 (centralise attrs across all read backends). No new HIGH/MEDIUM findings. SAFE/IO-bound holds. - New _populate_attrs_from_geo_info helper runs once per read, not per chunk -- no perf impact on hot paths. - Probe: read_geotiff_dask(2560x2560, chunks=256) yields 400 tasks (4 tasks/chunk for 100 chunks), well under 1M cap. - Probe: read_geotiff_gpu(1024x1024) returns cupy.ndarray end-to-end with no host round-trip (226ms incl. write+decode).
Audited the one geotiff commit added since pass 10 -- #1559 (PR 1548), which centralises attrs population across the four read backends. The change is a pure metadata refactor; no data-path arithmetic was touched. Windowed-origin math and cross-backend attrs/data parity verified.
da.unique().compute() on a dask+cupy array returns a cupy array, which propagates through np.sort and then into source_ids[best_idx], making alloc a cupy array. The downstream fric_weight[alloc == sid] then attempts an implicit numpy conversion and raises TypeError. Convert the unique result to numpy via .get() before the mask/sort so source_ids is always numpy.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a backend interop bug in balanced_allocation when running on dask+cupy by ensuring extracted source_ids are always NumPy, preventing downstream CuPy implicit NumPy conversion errors.
Changes:
- Convert
da.unique(...).compute()results to NumPy in_extract_sourceswhen the computed array is a CuPy array. - Update
.claudesweep state CSVs with new GeoTIFF audit pass entries (security/performance/accuracy tracking metadata).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/balanced_allocation.py |
Forces dask+cupy da.unique().compute() output to NumPy so source_ids stays NumPy and avoids CuPy implicit conversion errors. |
.claude/sweep-security-state.csv |
Records latest GeoTIFF security audit pass metadata. |
.claude/sweep-performance-state.csv |
Records latest GeoTIFF performance audit pass metadata. |
.claude/sweep-accuracy-state.csv |
Records latest GeoTIFF accuracy audit pass metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+67
to
+68
| if hasattr(uniq, 'get'): # cupy | ||
| uniq = uniq.get() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_extract_sourcesreturned a cupy array on dask+cupy rasters becauseda.unique().compute()yields cupy andnp.sortdid not convert. That propagated intosource_ids[best_idx], makingalloccupy, andfric_weight[alloc == sid]tripped cupy's implicit-numpy-conversion guard..get()before mask/sort sosource_idsis always numpy.Test plan
pytest xrspatial/tests/test_balanced_allocation.py— 17/17 pass across numpy, cupy, dask+numpy, dask+cupytest_two_sources_uniform_friction[dask+cupy]now passes