Skip to content

Test automation: zero-downtime upgrades #1438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 8, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Code review and GKE router support
  • Loading branch information
sjberman committed Jan 8, 2024
commit b08965889fbf049165110221d761cae49a2fabd4
23 changes: 18 additions & 5 deletions tests/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ TAG = edge
PREFIX = nginx-gateway-fabric
NGINX_PREFIX = $(PREFIX)/nginx
PULL_POLICY=Never
GW_API_VERSION ?= 1.0.0
GW_API_PREV_VERSION ?= 1.0.0 ## Supported Gateway API version from previous NGF release
GW_API_VERSION ?= 1.0.0 ## Supported Gateway API version for NGF under test
K8S_VERSION ?= latest ## Expected format: 1.24 (major.minor) or latest
GW_SERVICE_TYPE=NodePort
GW_SVC_GKE_INTERNAL=false
Expand Down Expand Up @@ -30,7 +31,8 @@ load-images: ## Load NGF and NGINX images on configured kind cluster
kind load docker-image $(PREFIX):$(TAG) $(NGINX_PREFIX):$(TAG)

test: ## Run the system tests against your default k8s cluster
go test -v ./suite $(GINKGO_FLAGS) -args --gateway-api-version=$(GW_API_VERSION) --image-tag=$(TAG) \
go test -v ./suite $(GINKGO_FLAGS) -args --gateway-api-version=$(GW_API_VERSION) \
--gateway-api-prev-version=$(GW_API_PREV_VERSION) --image-tag=$(TAG) \
--ngf-image-repo=$(PREFIX) --nginx-image-repo=$(NGINX_PREFIX) --pull-policy=$(PULL_POLICY) \
--k8s-version=$(K8S_VERSION) --service-type=$(GW_SERVICE_TYPE) --is-gke-internal-lb=$(GW_SVC_GKE_INTERNAL)

Expand All @@ -46,9 +48,20 @@ run-tests-on-vm: ## Run the tests on a GCP VM
create-and-setup-vm: ## Create and setup a GCP VM for tests
bash scripts/create-and-setup-gcp-vm.sh

.PHONY: create-vm-and-run-tests
create-vm-and-run-tests: create-and-setup-vm run-tests-on-vm ## Create and setup a GCP VM for tests and run the tests

.PHONY: cleanup-vm
cleanup-vm: ## Delete the test GCP VM and delete the firewall rule
bash scripts/cleanup-vm.sh

.PHONY: create-gke-router
create-gke-router: ## Create a GKE router to allow egress traffic from private nodes (allows for external image pulls)
bash scripts/create-gke-router.sh

.PHONY: cleanup-router
cleanup-router: ## Delete the GKE router
bash scripts/cleanup-router.sh

.PHONY: setup-gcp-and-run-tests
setup-gcp-and-run-tests: create-gke-router create-and-setup-vm run-tests-on-vm ## Create and setup a GKE router and GCP VM for tests and run the tests

.PHONY: cleanup-gcp
cleanup-gcp: cleanup-router cleanup-vm ## Cleanup all GCP resources
40 changes: 31 additions & 9 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,17 @@ make

```text
build-images Build NGF and NGINX images
cleanup-vm Delete the test GCP VM and the firewall rule
cleanup-gcp Cleanup all GCP resources
cleanup-router Delete the GKE router
cleanup-vm Delete the test GCP VM and delete the firewall rule
create-and-setup-vm Create and setup a GCP VM for tests
create-gke-router Create a GKE router to allow egress traffic from private nodes (allows for external image pulls)
create-kind-cluster Create a kind cluster
create-vm-and-run-tests Create and setup a GCP VM for tests and run the tests
delete-kind-cluster Delete kind cluster
help Display this help
load-images Load NGF and NGINX images on configured kind cluster
run-tests-on-vm Run the tests on a GCP VM
setup-gcp-and-run-tests Create and setup a GKE router and GCP VM for tests and run the tests
test Run the system tests against your default k8s cluster
```

Expand Down Expand Up @@ -101,15 +104,24 @@ make test TAG=$(whoami)
This step only applies if you would like to run the tests from a GCP based VM.

Before running the below `make` command, copy the `scripts/vars.env-example` file to `scripts/vars.env` and populate the
required env vars. The `GKE_CLUSTER_ZONE` needs to be the zone of your GKE cluster, and `GKE_SVC_ACCOUNT` needs to be
the name of a service account that has Kubernetes admin permissions.
required env vars. `GKE_SVC_ACCOUNT` needs to be the name of a service account that has Kubernetes admin permissions.

To create and setup the VM (including creating a firewall rule allowing SSH access from your local machine, and
optionally adding the VM IP to the `master-authorized-networks` list of your GKE cluster if
`ADD_VM_IP_AUTH_NETWORKS` is set to `true`) and run the tests, run the following
In order to run the tests in GCP, you need a few things:

- GKE router to allow egress traffic (used by upgrade tests for pulling images from Github)
- this assumes that your GKE cluster is using private nodes. If using public nodes, you don't need this.
- GCP VM and firewall rule to send ingress traffic to GKE

To set up the GCP environment with the router and VM and then run the tests, run the following command:

```makefile
make create-vm-and-run-tests
make setup-gcp-and-run-tests
```

If you just need a VM and no router (this will not run the tests):

```makefile
make create-and-setup-vm
```

To use an existing VM to run the tests, run the following
Expand Down Expand Up @@ -179,7 +191,17 @@ For more information of filtering specs, see [the docs here](https://onsi.github
make delete-kind-cluster
```

2. Delete the cloud VM and cleanup the firewall rule, if required
2. Delete the GCP components (GKE router, VM, and firewall rule), if required

```makefile
make cleanup-gcp
```

or

```makefile
make cleanup-router
```

```makefile
make cleanup-vm
Expand Down
9 changes: 7 additions & 2 deletions tests/framework/load.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ func convertTargetToVegetaTarget(targets []Target) []vegeta.Target {
return vegTargets
}

// Metrics is a wrapper around the vegeta Metrics.
type Metrics struct {
vegeta.Metrics
}

// RunLoadTest uses Vegeta to send traffic to the provided Targets at the given rate for the given duration and writes
// the results to the provided file
func RunLoadTest(
Expand All @@ -40,7 +45,7 @@ func RunLoadTest(
desc,
proxy,
serverName string,
) (vegeta.Results, vegeta.Metrics) {
) (vegeta.Results, Metrics) {
vegTargets := convertTargetToVegetaTarget(targets)
targeter := vegeta.NewStaticTargeter(vegTargets...)

Expand Down Expand Up @@ -75,5 +80,5 @@ func RunLoadTest(
}
metrics.Close()

return results, metrics
return results, Metrics{metrics}
}
10 changes: 8 additions & 2 deletions tests/framework/results.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package framework

import (
"fmt"
"io"
"os"
"os/exec"
"path/filepath"
Expand Down Expand Up @@ -70,8 +71,13 @@ func GeneratePNG(resultsDir, inputFilename, outputFilename string) ([]byte, erro
}

// WriteResults writes the vegeta metrics results to the results file in text format.
func WriteResults(resultsFile *os.File, metrics *vegeta.Metrics) error {
reporter := vegeta.NewTextReporter(metrics)
func WriteResults(resultsFile *os.File, metrics *Metrics) error {
reporter := vegeta.NewTextReporter(&metrics.Metrics)

return reporter.Report(resultsFile)
}

// NewCSVEncoder returns a vegeta CSV encoder.
func NewCSVEncoder(w io.Writer) vegeta.Encoder {
return vegeta.NewCSVEncoder(w)
}
6 changes: 6 additions & 0 deletions tests/scripts/cleanup-router.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

source scripts/vars.env

gcloud compute routers nats delete ${GKE_NATS_CONFIG_NAME} --router ${GKE_ROUTER_NAME} --router-region ${GKE_CLUSTER_REGION}
gcloud compute routers delete ${GKE_ROUTER_NAME} --region ${GKE_CLUSTER_REGION}
1 change: 0 additions & 1 deletion tests/scripts/create-and-setup-gcp-vm.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
PARENT_DIR=$(dirname "$SCRIPT_DIR")

source scripts/vars.env

Expand Down
13 changes: 13 additions & 0 deletions tests/scripts/create-gke-router.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash

source scripts/vars.env

gcloud compute routers create ${GKE_ROUTER_NAME} \
--region ${GKE_CLUSTER_REGION} \
--network default

gcloud compute routers nats create ${GKE_NATS_CONFIG_NAME} \
--router-region ${GKE_CLUSTER_REGION} \
--router ${GKE_ROUTER_NAME} \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips
3 changes: 3 additions & 0 deletions tests/scripts/vars.env-example
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ PREFIX=<prefix of the remote image>
NGINX_PREFIX=<prefix of the remote nginx image>
GKE_CLUSTER_NAME=<name of deployed GKE cluster>
GKE_CLUSTER_ZONE=<zone where GKE cluster is deployed>
GKE_CLUSTER_REGION=<region where GKE cluster is deployed>
GKE_ROUTER_NAME=<name of the router to create to allow egress traffic from private GKE nodes>
GKE_NATS_CONFIG_NAME=<name of the nats config to create for the above router>
GKE_PROJECT=<GCP project>
GKE_SVC_ACCOUNT=<service account with k8s admin permissions>
IMAGE=projects/debian-cloud/global/images/debian-11-bullseye-v20231212
Expand Down
21 changes: 14 additions & 7 deletions tests/suite/system_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,11 @@ func TestNGF(t *testing.T) {
}

var (
gatewayAPIVersion = flag.String("gateway-api-version", "", "Version of Gateway API to install")
k8sVersion = flag.String("k8s-version", "latest", "Version of k8s being tested on")
gatewayAPIVersion = flag.String("gateway-api-version", "", "Supported Gateway API version for NGF under test")
gatewayAPIPrevVersion = flag.String(
"gateway-api-prev-version", "", "Supported Gateway API version for previous NGF release",
)
k8sVersion = flag.String("k8s-version", "latest", "Version of k8s being tested on")
// Configurable NGF installation variables. Helm values will be used as defaults if not specified.
ngfImageRepository = flag.String("ngf-image-repo", "", "Image repo for NGF control plane")
nginxImageRepository = flag.String("nginx-image-repo", "", "Image repo for NGF data plane")
Expand Down Expand Up @@ -71,8 +74,9 @@ const (
)

type setupConfig struct {
chartPath string
deploy bool
chartPath string
gwAPIVersion string
deploy bool
}

func setup(cfg setupConfig, extraInstallArgs ...string) {
Expand Down Expand Up @@ -130,7 +134,7 @@ func setup(cfg setupConfig, extraInstallArgs ...string) {
version = "edge"
}

output, err := framework.InstallGatewayAPI(k8sClient, *gatewayAPIVersion, *k8sVersion)
output, err := framework.InstallGatewayAPI(k8sClient, cfg.gwAPIVersion, *k8sVersion)
Expect(err).ToNot(HaveOccurred(), string(output))

output, err = framework.InstallNGF(installCfg, extraInstallArgs...)
Expand All @@ -143,6 +147,8 @@ func setup(cfg setupConfig, extraInstallArgs ...string) {
timeoutConfig.CreateTimeout,
)
Expect(err).ToNot(HaveOccurred())
Expect(podNames).ToNot(BeNil())
Expect(podNames).ToNot(HaveLen(0))

if *serviceType != "LoadBalancer" {
portFwdPort, err = framework.PortForward(k8sConfig, installCfg.Namespace, podNames[0], portForwardStopCh)
Expand Down Expand Up @@ -194,8 +200,9 @@ var _ = BeforeSuite(func() {
localChartPath = filepath.Join(basepath, "deploy/helm-chart")

cfg := setupConfig{
chartPath: localChartPath,
deploy: true,
chartPath: localChartPath,
gwAPIVersion: *gatewayAPIVersion,
deploy: true,
}

// If we are running the upgrade test only, then skip the initial deployment.
Expand Down
31 changes: 20 additions & 11 deletions tests/suite/upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import (

. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
vegeta "github.com/tsenart/vegeta/v12/lib"
coordination "k8s.io/api/coordination/v1"
core "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Expand Down Expand Up @@ -63,8 +62,9 @@ var _ = Describe("Upgrade testing", Label("upgrade"), func() {
teardown()

cfg := setupConfig{
chartPath: "oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric",
deploy: true,
chartPath: "oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric",
gwAPIVersion: *gatewayAPIPrevVersion,
deploy: true,
}
setup(cfg, "--values", valuesFile)

Expand Down Expand Up @@ -106,7 +106,7 @@ var _ = Describe("Upgrade testing", Label("upgrade"), func() {
}

type metricsResults struct {
metrics *vegeta.Metrics
metrics *framework.Metrics
testName string
scheme string
}
Expand Down Expand Up @@ -158,7 +158,7 @@ var _ = Describe("Upgrade testing", Label("upgrade"), func() {
}

buf := new(bytes.Buffer)
encoder := vegeta.NewCSVEncoder(buf)
encoder := framework.NewCSVEncoder(buf)
for _, res := range results {
res := res
Expect(encoder.Encode(&res)).To(Succeed())
Expand All @@ -183,30 +183,39 @@ var _ = Describe("Upgrade testing", Label("upgrade"), func() {
// allow traffic flow to start
time.Sleep(2 * time.Second)

output, err := framework.UpgradeNGF(cfg, "--values", valuesFile)
// update Gateway API and NGF
output, err := framework.InstallGatewayAPI(k8sClient, *gatewayAPIVersion, *k8sVersion)
Expect(err).ToNot(HaveOccurred(), string(output))

output, err = framework.UpgradeNGF(cfg, "--values", valuesFile)
Expect(err).ToNot(HaveOccurred(), string(output))

Expect(resourceManager.ApplyFromFiles([]string{"ngf-upgrade/gateway-updated.yaml"}, ns.Name)).To(Succeed())

podNames, err := framework.GetNGFPodNames(k8sClient, ngfNamespace, releaseName, timeoutConfig.GetTimeout)
Expect(err).ToNot(HaveOccurred())

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)
defer cancel()
Expect(podNames).ToNot(BeNil())
Expect(podNames).ToNot(HaveLen(0))

// ensure that the leader election lease has been updated to the new pods
leaseCtx, leaseCancel := context.WithTimeout(context.Background(), timeoutConfig.GetTimeout)
defer leaseCancel()

var lease coordination.Lease
key := types.NamespacedName{Name: "ngf-test-nginx-gateway-fabric-leader-election", Namespace: ngfNamespace}
Expect(k8sClient.Get(ctx, key, &lease)).To(Succeed())
Expect(k8sClient.Get(leaseCtx, key, &lease)).To(Succeed())

Expect(lease.Spec.HolderIdentity).ToNot(BeNil())
Expect(podNames).To(ContainElement(*lease.Spec.HolderIdentity))

// ensure that the Gateway has been properly updated with a new listener
gwCtx, gwCancel := context.WithTimeout(context.Background(), 1*time.Minute)
defer gwCancel()

var gw v1.Gateway
key = types.NamespacedName{Name: "gateway", Namespace: ns.Name}
Expect(wait.PollUntilContextCancel(
ctx,
gwCtx,
500*time.Millisecond,
true, /* poll immediately */
func(ctx context.Context) (bool, error) {
Expand Down