Skip to content

POJOs for MultiShard TableReader.#3685

Merged
VardhanThigle merged 1 commit intoGoogleCloudPlatform:mainfrom
VardhanThigle:graph-size-pojo
Apr 15, 2026
Merged

POJOs for MultiShard TableReader.#3685
VardhanThigle merged 1 commit intoGoogleCloudPlatform:mainfrom
VardhanThigle:graph-size-pojo

Conversation

@VardhanThigle
Copy link
Copy Markdown
Contributor

@VardhanThigle VardhanThigle commented Apr 14, 2026

Foundations for Multi-Shard Routing (POJOs)

This is the first child of #3684

Design Decision

This PR introduces the foundational data models and services required to support multi-shard JDBC migrations.

Key Components:

  • JdbcIoWrapperConfigGroup: A container that aggregates multiple JdbcIOWrapperConfig instances. It ensures that all shards in a group share a uniform SQLDialect, simplifying downstream transform logic.
  • DataSourceProvider Interface: Defines the contract for worker-side connection routing.
  • DataSourceManager / DataSourceManagerImpl: A centralized service for managing and caching connection sources across multiple physical shards.

Rationale:

To achieve a constant graph size, we must move connection details from pipeline construction time (PTransform fields) to element processing time (DoFn logic). These POJOs provide the structure for this dynamic routing.


Why it's Safe (Concurrency & Resource Leaks)

  • Thread-Safe Caching: DataSourceManagerImpl implements a thread-safe cache for DataSource instances using ReentrantLock and a double-checked locking pattern. This ensures that even when multiple Beam bundles execute concurrently on the same worker, each physical shard connection pool is initialized exactly once.
  • Resource Management: The DataSourceManager provides a closeAll() method. This is specifically designed to be called during DoFn.teardown() or finishBundle() to ensure that all opened connection pools are gracefully released, preventing resource leaks on Dataflow workers.
  • Serialization Robustness: Fields within DataSourceManagerImpl are marked transient where appropriate, and the class includes null-safety checks to handle lifecycle transitions in the Dataflow environment.

Testing

The added tests verify:

  • Correct aggregation of shard configurations.
  • Thread-safe lazy initialization of data sources.
  • Proper handling of unknown shard IDs.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational infrastructure for multi-shard table reader schema discovery. It provides a robust framework for managing multiple JDBC data sources within a Dataflow worker, focusing on thread-safe connection handling, resource lifecycle management, and configuration validation to support complex multi-shard migration scenarios.

Highlights

  • Configuration Management: Introduced JdbcIoWrapperConfigGroup to encapsulate and validate JDBC shard configurations, ensuring dialect consistency.
  • Data Source Provisioning: Added DataSourceProvider and its implementation to enable dynamic, serializable routing of data sources in multi-shard environments.
  • Lifecycle Management: Implemented DataSourceManager to handle thread-safe, on-demand initialization, caching, and graceful teardown of data source connections.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.67%. Comparing base (72a145d) to head (8128b2a).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3685      +/-   ##
============================================
+ Coverage     52.47%   58.67%   +6.19%     
+ Complexity     6224     2122    -4102     
============================================
  Files          1062      505     -557     
  Lines         64136    29129   -35007     
  Branches       7089     3191    -3898     
============================================
- Hits          33654    17090   -16564     
+ Misses        28198    11061   -17137     
+ Partials       2284      978    -1306     
Components Coverage Δ
spanner-templates 73.76% <100.00%> (+1.64%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 80.77% <ø> (-0.09%) ⬇️
spanner-live-reverse-replication 77.46% <ø> (-0.07%) ⬇️
spanner-bulk-migration 89.32% <100.00%> (+0.01%) ⬆️
gcs-spanner-dv 86.67% <ø> (+0.94%) ⬆️
Files with missing lines Coverage Δ
...dbc/iowrapper/config/JdbcIoWrapperConfigGroup.java 100.00% <100.00%> (ø)
...o/jdbc/uniformsplitter/DataSourceProviderImpl.java 100.00% <100.00%> (ø)
...formsplitter/transforms/DataSourceManagerImpl.java 100.00% <100.00%> (ø)

... and 585 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@VardhanThigle VardhanThigle marked this pull request as ready for review April 14, 2026 10:26
@VardhanThigle VardhanThigle requested a review from a team as a code owner April 14, 2026 10:26
@VardhanThigle VardhanThigle changed the title POJOs for MultiShard TableReader Schema-Discovery. POJOs for MultiShard TableReader. Apr 14, 2026
sm745052
sm745052 previously approved these changes Apr 15, 2026
Copy link
Copy Markdown
Contributor

@sm745052 sm745052 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@sm745052 sm745052 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VardhanThigle VardhanThigle merged commit 4999e32 into GoogleCloudPlatform:main Apr 15, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants