diff --git a/contributor-docs/add-integration-or-load-test.md b/contributor-docs/add-integration-or-load-test.md index 4c81c39187..213a225afa 100644 --- a/contributor-docs/add-integration-or-load-test.md +++ b/contributor-docs/add-integration-or-load-test.md @@ -571,6 +571,17 @@ public void testBacklog(){ } ``` +## Set up Test Infrastructure (Spanner) + +If your integration tests target **Spanner**, you need to provision the required +Google Cloud infrastructure before running the tests. Terraform scripts for this +are provided in [`test-infra/terraform/spanner/`](../test-infra/terraform/spanner/). + +This is the recommended path for external contributors who are setting up a fresh +test environment. See the [README](../test-infra/terraform/spanner/README.md) in +that directory for full usage instructions. +--- + ## Run the Test For manually running a load test execute the following commands on the CLI use the following commands, diff --git a/test-infra/terraform/spanner/README.md b/test-infra/terraform/spanner/README.md new file mode 100644 index 0000000000..df8ff914d5 --- /dev/null +++ b/test-infra/terraform/spanner/README.md @@ -0,0 +1,74 @@ +# Spanner Integration Test Infrastructure (Terraform) + +This directory contains Terraform scripts to provision the Google Cloud infrastructure +required to run Spanner integration tests for Dataflow Templates. + +## Intended Audience + +This setup is intended for external contributors who need to stand up a self-contained +test environment before running the Spanner-related integration tests in the `it/` directory. + +## What Gets Provisioned +The Terraform scripts provision the following resources: +- **Spanner Instance**: A regional Spanner instance. +- **GCS Bucket**: Used for staging artifacts (named by `gcs_bucket_name` or defaults to `-it-infra-bucket`). +- **Datastream Private Connectivity**: For private network connection between Datastream and VPC. *(Note: The ID `"datastream-connect-2"` is currently hardcoded as integration tests implicitly expect this specific connection name)*. +- **Cloud SQL Instances**: + - PostgreSQL instance (Private IP only, `ENTERPRISE_PLUS` edition, `db-perf-optimized-N-16`) + - MySQL instance (Private IP only, `db-n1-standard-1`) +- **Compute Instance (VM)**: A consolidated VM (`it-infra-vm`) used for running tests and proxying Datastream traffic. It runs Cloud SQL Proxy and initializes environment with Docker, Maven, Git, OpenJDK, `gh` (GitHub CLI), and `jq`. +- **IAM Roles**: Custom role `it_infra_role` with permissions for Dataflow, GCS, and Compute, bound to the default compute service account. +- **Firewall Rule**: Allows Datastream to access the proxy on ports 3306 and 5432. + +## Prerequisites + +- [Terraform](https://developer.hashicorp.com/terraform/downloads) >= 1.0 +- [gcloud CLI](https://cloud.google.com/sdk/docs/install) authenticated with sufficient permissions +- A Google Cloud project with billing enabled + +## Usage + +1. Authenticate with Google Cloud: + ```shell + gcloud auth application-default login + ``` + +2. Navigate to this directory: + ```shell + cd test-infra/terraform/spanner + ``` + +3. **Configure Remote State**: Before initializing, update the `bucket` property in `backend.tf` to an existing GCS bucket in your project for storing Terraform state. Alternatively, you can pass it dynamically during initialization: + ```shell + terraform init -backend-config="bucket=" + ``` + +4. Review the planned changes: + ```shell + terraform plan -var="project_id=" + ``` + +5. Apply the configuration: + ```shell + terraform apply -var="project_id=" + ``` + +6. Once testing is complete, tear down all provisioned resources: + ```shell + terraform destroy -var="project_id=" + ``` + +## Variables + +| Variable | Description | Required | Default | +|----------|-------------|----------|---------| +| `project_id` | The GCP project ID to deploy resources into | Yes | N/A | +| `region` | The GCP region to use | No | `us-central1` | +| `zone` | The GCP zone to use | No | `us-central1-a` | +| `network` | The VPC network to use | No | `default` | +| `subnetwork` | The subnetwork to use | No | `default` | +| `spanner_instance_name` | The name of the Spanner instance | No | `it-infra-spanner` | +| `gcs_bucket_name` | The name of the GCS bucket | No | `""` (defaults to `project_id-it-infra-bucket`) | +| `postgres_instance_name` | The name of the PostgreSQL instance | No | `it-infra-pg-db-instance` | +| `mysql_instance_name` | The name of the MySQL instance | No | `it-infra-mysql-db-instance` | +| `it_infra_vm_name` | The name of the consolidated VM for tests and proxy | No | `it-infra-vm` | \ No newline at end of file diff --git a/test-infra/terraform/spanner/main.tf b/test-infra/terraform/spanner/main.tf new file mode 100644 index 0000000000..2f3d0cfae4 --- /dev/null +++ b/test-infra/terraform/spanner/main.tf @@ -0,0 +1,299 @@ +provider "google" { + project = var.project_id + region = var.region + zone = var.zone +} + +# Spanner Instance +resource "google_spanner_instance" "spanner" { + name = var.spanner_instance_name + config = "regional-${var.region}" + display_name = var.spanner_instance_name + processing_units = 50000 +} + +locals { + gcs_bucket_name = var.gcs_bucket_name != "" ? var.gcs_bucket_name : "${var.project_id}-it-infra-bucket" +} + +# GCS Bucket +resource "google_storage_bucket" "bucket" { + name = local.gcs_bucket_name + location = var.region + force_destroy = true + uniform_bucket_level_access = true +} + +# Network Configuration + +data "google_compute_image" "ubuntu" { + family = "ubuntu-2204-lts" + project = "ubuntu-os-cloud" +} + +data "google_compute_network" "default" { + name = var.network +} + +# Private Services Access (Required for Cloud SQL Private IP) +resource "google_compute_global_address" "private_ip_alloc" { + name = "private-services-ip-allocation" + purpose = "VPC_PEERING" + address_type = "INTERNAL" + prefix_length = 16 + network = data.google_compute_network.default.id +} + +resource "google_service_networking_connection" "private_vpc_connection" { + network = data.google_compute_network.default.id + service = "servicenetworking.googleapis.com" + reserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name] +} + + +# Datastream Private Connectivity +resource "google_datastream_private_connection" "private_conn" { + display_name = "datastream-connect-2" + location = var.region + private_connection_id = "datastream-connect-2" + + vpc_peering_config { + vpc = data.google_compute_network.default.id + subnet = "10.3.0.0/29" # Should be small /29 range unused in VPC + } +} + + +# Cloud SQL PostgreSQL (Private IP) +resource "google_sql_database_instance" "postgres" { + depends_on = [google_service_networking_connection.private_vpc_connection] + deletion_protection = false + name = var.postgres_instance_name + database_version = "POSTGRES_15" + region = var.region + + settings { + tier = "db-perf-optimized-N-16" + edition = "ENTERPRISE_PLUS" + ip_configuration { + ipv4_enabled = false + private_network = data.google_compute_network.default.id + } + + database_flags { + name = "cloudsql.logical_decoding" + value = "on" + } + database_flags { + name = "max_replication_slots" + value = "1000" + } + database_flags { + name = "max_wal_senders" + value = "1000" + } + } +} + +resource "google_sql_user" "postgres" { + name = "postgres" + instance = google_sql_database_instance.postgres.name + password = "Hello@123" +} + + +# 6. Cloud SQL MySQL (Private IP) +resource "google_sql_database_instance" "mysql" { + depends_on = [google_service_networking_connection.private_vpc_connection] + deletion_protection = false + name = var.mysql_instance_name + database_version = "MYSQL_8_0" + region = var.region + + + + settings { + tier = "db-n1-standard-1" + ip_configuration { + ipv4_enabled = false + private_network = data.google_compute_network.default.id + } + backup_configuration { + enabled = true + binary_log_enabled = true + } + } +} + +resource "google_sql_user" "root" { + name = "root" + instance = google_sql_database_instance.mysql.name + host = "%" + password = "Hello@123" +} + +resource "google_compute_address" "it_infra_vm_ip" { + name = "it-infra-vm-ip" + subnetwork = var.subnetwork + address_type = "INTERNAL" + region = var.region +} + +# 8. Consolidated VM for Tests and Proxy +resource "google_compute_instance" "it_infra_vm" { + name = var.it_infra_vm_name + machine_type = "n2-standard-16" + allow_stopping_for_update = true + zone = var.zone + tags = ["datastream-proxy"] + + boot_disk { + initialize_params { + image = data.google_compute_image.ubuntu.self_link + size = 100 + } + } + + network_interface { + network = data.google_compute_network.default.id + network_ip = google_compute_address.it_infra_vm_ip.address + access_config { + # Ephemeral public IP + } + } + + service_account { + scopes = ["cloud-platform"] + } + + metadata_startup_script = <<-EOT + #!/bin/bash + export DEBIAN_FRONTEND=noninteractive + export REPO_URL="https://github.com/GoogleCloudPlatform/DataflowTemplates" + + # User provided startup script + user=runner + id -u $user &> /dev/null | sudo useradd $user + + ulimit -n 65536 + sudo sysctl -w vm.max_map_count=262144 + + sudo add-apt-repository ppa:git-core/ppa -y + sudo apt update + sudo apt install git -y + sudo apt install openjdk-17-jdk-headless -y + sudo apt install jq -y + + pushd /opt/ + MAVEN_VER=3.9.9 + sudo wget https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/$${MAVEN_VER}/apache-maven-$${MAVEN_VER}-bin.tar.gz + sudo tar -xvzf apache-maven-$${MAVEN_VER}-bin.tar.gz apache-maven-$${MAVEN_VER} + sudo rm -f /opt/apache-maven-$${MAVEN_VER}-bin.tar.gz + sudo ln -s /opt/apache-maven-$${MAVEN_VER} /opt/maven + sudo ln -s /opt/maven/bin/mvn /usr/local/bin/mvn + popd + + # Install gh + sudo apt install curl -y \ + && sudo curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \ + && sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \ + && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \ + && sudo apt update \ + && sudo apt install gh -y + + # Install docker + sudo apt update + sudo apt install ca-certificates curl gnupg lsb-release -y + sudo mkdir -p /etc/apt/keyrings + curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg + echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + sudo apt update + sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y + + sudo groupadd docker + sudo gpasswd -a $user docker + + sudo mkdir /home/$user + sudo chown $user /home/$user + + # Install Cloud SQL Proxy + sudo wget https://storage.googleapis.com/cloud-sql-connectors/cloud-sql-proxy/v2.10.1/cloud-sql-proxy.linux.amd64 -O cloud-sql-proxy + sudo chmod +x cloud-sql-proxy + sudo mv cloud-sql-proxy /usr/local/bin/ + + # Run Cloud SQL Proxy for MySQL + cloud-sql-proxy --address 0.0.0.0 ${google_sql_database_instance.mysql.connection_name} --private-ip & + + # Run Cloud SQL Proxy for PG + cloud-sql-proxy --address 0.0.0.0 ${google_sql_database_instance.postgres.connection_name} --private-ip & + + EOT + +} + +data "google_project" "current" {} + +locals { + compute_sa = "${data.google_project.current.number}-compute@developer.gserviceaccount.com" +} + +resource "google_project_iam_custom_role" "it_infra_role" { + project = var.project_id + role_id = "it_infra_role" + title = "it-infra role" + description = "Custom role for Spanner Bulk migrations" + permissions = [ + "compute.firewalls.create", + "compute.firewalls.delete", + "compute.firewalls.update", + "dataflow.jobs.cancel", + "dataflow.jobs.create", + "dataflow.jobs.updateContents", + "iam.roles.get", + "iam.serviceAccounts.actAs", + "resourcemanager.projects.setIamPolicy", + "storage.objects.delete", + "storage.objects.create", + "storage.buckets.create", + "serviceusage.services.use", + "serviceusage.services.enable", + ] +} + +resource "google_project_iam_member" "custom_role_binding" { + project = var.project_id + role = google_project_iam_custom_role.it_infra_role.id + member = "serviceAccount:${local.compute_sa}" +} + +resource "google_project_iam_member" "viewer_binding" { + project = var.project_id + role = "roles/viewer" + member = "serviceAccount:${local.compute_sa}" +} + +resource "google_project_iam_member" "storage_admin_binding" { + project = var.project_id + role = "roles/storage.objectAdmin" + member = "serviceAccount:${local.compute_sa}" +} + +resource "google_project_iam_member" "spanner_admin_binding" { + project = var.project_id + role = "roles/spanner.databaseAdmin" + member = "serviceAccount:${local.compute_sa}" +} + +resource "google_compute_firewall" "allow_datastream_to_proxy" { + name = "allow-datastream-to-proxy" + network = data.google_compute_network.default.name + + allow { + protocol = "tcp" + ports = ["3306", "5432"] + } + + source_ranges = ["10.3.0.0/29"] + target_tags = ["datastream-proxy"] +} + diff --git a/test-infra/terraform/spanner/outputs.tf b/test-infra/terraform/spanner/outputs.tf new file mode 100644 index 0000000000..c1a628daf3 --- /dev/null +++ b/test-infra/terraform/spanner/outputs.tf @@ -0,0 +1,23 @@ +output "spanner_instance" { + value = google_spanner_instance.spanner.name +} + +output "gcs_bucket" { + value = google_storage_bucket.bucket.name +} + +output "datastream_private_connection_id" { + value = google_datastream_private_connection.private_conn.id +} + +output "postgres_connection_name" { + value = google_sql_database_instance.postgres.connection_name +} + +output "mysql_connection_name" { + value = google_sql_database_instance.mysql.connection_name +} + +output "it_infra_vm_ip" { + value = google_compute_instance.it_infra_vm.network_interface[0].network_ip +} diff --git a/test-infra/terraform/spanner/variables.tf b/test-infra/terraform/spanner/variables.tf new file mode 100644 index 0000000000..9a46352632 --- /dev/null +++ b/test-infra/terraform/spanner/variables.tf @@ -0,0 +1,58 @@ +variable "project_id" { + description = "The GCP project ID to deploy resources into" + type = string +} + +variable "region" { + description = "The GCP region to use" + type = string + default = "us-central1" +} + +variable "zone" { + description = "The GCP zone to use" + type = string + default = "us-central1-a" +} + +variable "network" { + description = "The VPC network to use" + type = string + default = "default" +} + +variable "subnetwork" { + description = "The subnetwork to use" + type = string + default = "default" +} + +variable "spanner_instance_name" { + description = "The name of the Spanner instance" + type = string + default = "it-infra-spanner" +} + +variable "gcs_bucket_name" { + description = "The name of the GCS bucket" + type = string + default = "" +} + +variable "postgres_instance_name" { + description = "The name of the PostgreSQL instance" + type = string + default = "it-infra-pg-db-instance" +} + +variable "mysql_instance_name" { + description = "The name of the MySQL instance" + type = string + default = "it-infra-mysql-db-instance" +} + +variable "it_infra_vm_name" { + description = "The name of the consolidated VM for tests and proxy" + type = string + default = "it-infra-vm" +}