POJOs for MultiShard TableReader.#3685
POJOs for MultiShard TableReader.#3685VardhanThigle merged 1 commit intoGoogleCloudPlatform:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational infrastructure for multi-shard table reader schema discovery. It provides a robust framework for managing multiple JDBC data sources within a Dataflow worker, focusing on thread-safe connection handling, resource lifecycle management, and configuration validation to support complex multi-shard migration scenarios. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3685 +/- ##
============================================
+ Coverage 52.47% 58.67% +6.19%
+ Complexity 6224 2122 -4102
============================================
Files 1062 505 -557
Lines 64136 29129 -35007
Branches 7089 3191 -3898
============================================
- Hits 33654 17090 -16564
+ Misses 28198 11061 -17137
+ Partials 2284 978 -1306
🚀 New features to boost your workflow:
|
1ef95aa to
26f56b5
Compare
26f56b5 to
8128b2a
Compare
Foundations for Multi-Shard Routing (POJOs)
This is the first child of #3684
Design Decision
This PR introduces the foundational data models and services required to support multi-shard JDBC migrations.
Key Components:
JdbcIoWrapperConfigGroup: A container that aggregates multipleJdbcIOWrapperConfiginstances. It ensures that all shards in a group share a uniformSQLDialect, simplifying downstream transform logic.DataSourceProviderInterface: Defines the contract for worker-side connection routing.DataSourceManager/DataSourceManagerImpl: A centralized service for managing and caching connection sources across multiple physical shards.Rationale:
To achieve a constant graph size, we must move connection details from pipeline construction time (PTransform fields) to element processing time (DoFn logic). These POJOs provide the structure for this dynamic routing.
Why it's Safe (Concurrency & Resource Leaks)
DataSourceManagerImplimplements a thread-safe cache forDataSourceinstances usingReentrantLockand a double-checked locking pattern. This ensures that even when multiple Beam bundles execute concurrently on the same worker, each physical shard connection pool is initialized exactly once.DataSourceManagerprovides acloseAll()method. This is specifically designed to be called duringDoFn.teardown()orfinishBundle()to ensure that all opened connection pools are gracefully released, preventing resource leaks on Dataflow workers.DataSourceManagerImplare markedtransientwhere appropriate, and the class includes null-safety checks to handle lifecycle transitions in the Dataflow environment.Testing
The added tests verify: