Skip to content

Conversation

@golgeek
Copy link
Contributor

@golgeek golgeek commented Oct 9, 2025

This patch sets the two environment variables required to start using the roachprod-centralized test infrastructure.

Epic: none
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Oct 9, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@golgeek golgeek changed the title Ludo/rp centralized use tc run roachtest: use roachprod-centralized in nightly runs Oct 9, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@golgeek golgeek force-pushed the ludo/rp_centralized_use_tc_run branch 3 times, most recently from 02a5008 to 263d9a2 Compare October 10, 2025 23:22
This patch prepares the ground work for roachprod-centralized by
enriching the vm.Provider interface with two new methods:
- `String()`: returns a human-readable (and map key) for the provider in
  its current configuration (e.g. `{ProviderName}-{project}`,
`{ProviderName}-{AccountNumber}`, ...)
- `IsLocalProvider()`: returns whether or not the provider is local

These two new methods will ease the work of spawning multiple instances
of the same provider, in different configurations.

Epic: none
Release note: None
This patch refactorizes the different providers to bring:
- a uniform way of instantiating a provider
- a better isolation of providers configurations to allow multiple
  providers to be instantiated in parallel

Epic: none
Release note: None
This patch adds provider management for the AWS SST credentials.

This allows each provider instance to generate its own short lived
credentials and pass them as environment variables to the aws CLI
without impacting other provider instances.

Epic: none
Release note: None
In order for roachprod-centralized to use the roachprod library,
and for the roachprod library to use the roachprod-centralized client,
it is required to move the `cloud.Cluster` and `cloud.Clusters` types
in their own package to avoid import cycles. These two types are moved
under `cloud/types` and all imports are aliased as `cloudcluster` to
ensure clarity in the code.

This patch also refactors `cloud.Cloud` so that `ListCloud` can be
executed against a pre-defined set of providers instead of the global
`vm.AllProviderNames()`. This is achieved by storing two providers
slices (one for remote, one for local) in the `cloud.Cloud` type, and
having a `ListCloud` method that runs against these providers. The
public `ListCloud` helper function has been rewritten to init a new
`cloud.Cloud` with the global providers and use the object's method to
avoid duplicating code.

Epic: none
Release note: None
Prior to this patch, only the GCE cloud provider had support for VMs
public DNS at creation/deletion time. All other Cloud Providers were
relying on the `Sync` operation to create DNS entries for all synced
clusters.

This patch brings support for instantiating all Cloud Providers with a
DNS Provider, and populates the required `vm.VM` fields to make DNS
behave the same across all Cloud Providers. The vm.DNSProvider interface
is extended to also contain the DNS methods of the InfraProvider
interface, which will allow a better separation of concerns.

Epic: none
Release note: None
@golgeek golgeek force-pushed the ludo/rp_centralized_use_tc_run branch 10 times, most recently from cee2755 to 4e9bcfe Compare October 17, 2025 22:18
Prior to this patch the GCE provider was 100% built around executing
gcloud commands. While this works fine with the local roachprod binary,
it creates two major issues when using it as a library:
- the output of gcloud commands can be huge and is cached by the kernel
  in the page cache and on top of this, Go has to reserve more or less
  the same memory allocations for json deserialization. The combination
  leads to memory exhaustion on low memory runners.
- using the auto-magical `glcoud dns record-sets import` command can
  fail due to inconsistencies between the state when the bind file is
  created and when the command is executed if other systems update the
  DNS zone at the same time.

This patch introduces the use of the GCP Golang SDK to list instances
(and instance templates) by patching the GCE provider, and manage DNS
records by creating a new SDK based DNS provider implementation. The new
GCE SDK DNS provider is also named `gce` to be retro-compatible with the
`glcoud` based provider, as they have the exact same capabilities. The
`gcloud` DNS provider will be deprecated.

The introduction of the new SDK DNS provider also ensures a better
independence between the GCE provider and the DNS provider by fully
using an interface in the VM provider. This greatly improves stability
in long running operations and improves the overall quality of the code.

Epic: none
Release note: None
Prior to this patch, ListCloud() and SyncDNS() paths in the roachprod
library did not provide a way to pass a context.Context. This was
preventing roachprod-centralized to respect tasks timeouts.

This patch adds a context in the new `cloud.Cloud.ListCloud()` function
and a `SyncDNSWithContext()` function in the `vm.DNSProvider` task. All
non-context versions call the context version by passing an argument
`context.Background()`.

With the exception of the `gcloud` version of the `gce` provider, the
`provider.List()` path has been updated to respect the provided context.

Epic: none
Release note: None
This patch is the initial commit for roachprod-centralized.

Epic: none
Release note: None
This patch starts using the roachprod centralized client from the
roachprod library when properly configured via environment variables.

Epic: none
Release note: None
This patch sets the two environment variables required to start using
the roachprod-centralized test infrastructure.

Epic: none
Release note: None
@golgeek golgeek force-pushed the ludo/rp_centralized_use_tc_run branch from 4e9bcfe to 7fe2ac3 Compare October 17, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants