Skip to content

Add Graceful Recovery Baseline Test #1111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Change nginx to NGINX and adjust test naming
  • Loading branch information
bjee19 committed Oct 11, 2023
commit 82ae192f213265ccf362a10f0cbf8770662685c5
38 changes: 19 additions & 19 deletions tests/graceful-recovery/graceful-recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,73 +19,73 @@ Ensure that NGF can recover gracefully from container failures without any user
5. Follow the [installation instructions](https://github.com/nginxinc/nginx-gateway-fabric/blob/main/docs/installation.md)
to deploy NGINX Gateway Fabric.
6. In a separate terminal track NGF logs by running `kubectl -n nginx-gateway logs -f deploy/nginx-gateway`
7. In a separate terminal track nginx container logs by running
7. In a separate terminal track NGINX container logs by running
`kubectl -n nginx-gateway logs -f <NGF_POD> -c nginx`
8. Exec into the nginx container inside of the NGF pod by running
8. Exec into the NGINX container inside of the NGF pod by running
`kubectl exec -it -n nginx-gateway <NGF_POD> --container nginx -- bin/sh`
9. Inside the nginx container, navigate to `/etc/nginx/conf.d` and ensure that
9. Inside the NGINX container, navigate to `/etc/nginx/conf.d` and ensure that
`http.conf` and `config-version.conf` look correct.
10. In a different terminal, deploy the
[https-termination example](https://github.com/nginxinc/nginx-gateway-fabric/tree/main/examples/https-termination).
11. Inside the nginx container, check `http.conf` and `config-version.config` to see
11. Inside the NGINX container, check `http.conf` and `config-version.config` to see
if the configuration and version were correctly updated.
12. Send traffic through the example application and ensure it is working correctly.

## Tests

### Restart nginx-gateway container

1. Ensure NGF and nginx container logs are set up and traffic flows through the example application correctly.
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
2. Insert ephemeral container in NGF Pod and kill the nginx-gateway process.
1. `kubectl debug -it -n nginx-gateway <NGF_POD> --image=busybox:1.28 --target=nginx-gateway`
2. run `ps -A`
3. run `kill <nginx-gateway_PID>` (Command should start with `/usr/bin/gateway`)
3. Check for errors in the NGF and nginx-container logs.
3. Check for errors in the NGF and NGINX container logs.
4. When the nginx-gateway container is back up, ensure traffic flows through the example application correctly.
5. Open up the NGF and nginx container logs and check for errors.
6. Inside the nginx container, check that `http.conf` was not changed and `config-version.conf` had its version set to `2`.
5. Open up the NGF and NGINX container logs and check for errors.
6. Inside the NGINX container, check that `http.conf` was not changed and `config-version.conf` had its version set to `2`.
7. Send traffic through the example application and ensure it is working correctly.
8. Check that NGF can still update statuses of resources.
1. Delete the HTTPRoute resources by running `kubectl delete -f cafe-routes.yaml` in `/examples/https-termination`
2. Inside the terminal which is inside the nginx container, check that `http.conf` and
2. Inside the terminal which is inside the NGINX container, check that `http.conf` and
`config-version.conf` were correctly updated.
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
4. Apply the HTTPRoute resources by running `kubectl apply -f cafe-routes.yaml` in `/examples/https-termination`
5. Inside the terminal which is inside the nginx container, check that `http.conf` and
5. Inside the terminal which is inside the NGINX container, check that `http.conf` and
`config-version.conf` were correctly updated.
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.

### Restart NGINX container

1. Ensure NGF and nginx container logs are set up and traffic flows through the example application correctly.
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
2. Insert ephemeral container in NGF Pod and kill the nginx-master process.
1. If there isn't already an ephemeral container inserted, run:
`kubectl debug -it -n nginx-gateway <NGF_POD> --image=busybox:1.28 --target=nginx-gateway`
2. run `ps -A`
3. run `kill <nginx-master_PID>` (Command should start with `nginx: master process`)
3. When nginx container is back up, ensure traffic flows through the example application correctly.
4. Open up the nginx-container logs and check for errors.
5. Exec back into the nginx container and check that `http.conf` and `config-version.conf` were not changed.
3. When NGINX container is back up, ensure traffic flows through the example application correctly.
4. Open up the NGINX container logs and check for errors.
5. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.

### Restart Node with draining

1. Switch over to a one-node Kind cluster. Can run `make create-kind-cluster` from main directory.
2. Run steps 4-12 of the Setup section above using [this guide]
(https://github.com/nginxinc/nginx-gateway-fabric/blob/main/docs/running-on-kind.md) for running on Kind.
3. Ensure NGF and nginx container logs are set up and traffic flows through the example application correctly.
3. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
4. Drain the node of its resources by running `kubectl drain kind-control-plane --ignore-daemonsets --delete-local-data`
5. Delete the node by running `kubectl delete node kind-control-plane`
6. Restart the docker container by running `docker restart kind-control-plane`
7. Open up both NGF and nginx-container logs and check for errors.
8. Exec back into the nginx container and check that `http.conf` and `config-version.conf` were not changed.
7. Open up both NGF and NGINX container logs and check for errors.
8. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.
9. Send traffic through the example application and ensure it is working correctly.
10. Check that NGF can still update statuses of resources.
1. Delete the HTTPRoute resources by running `kubectl delete -f cafe-routes.yaml` in `/examples/https-termination`
2. Inside the terminal which is inside the nginx container, check that `http.conf` and
2. Inside the terminal which is inside the NGINX container, check that `http.conf` and
`config-version.conf` were correctly updated.
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
4. Apply the HTTPRoute resources by running `kubectl apply -f cafe-routes.yaml` in `/examples/https-termination`
5. Inside the terminal which is inside the nginx container, check that `http.conf` and
5. Inside the terminal which is inside the NGINX container, check that `http.conf` and
`config-version.conf` were correctly updated.
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.

Expand Down
10 changes: 5 additions & 5 deletions tests/graceful-recovery/results/graceful-recover-results.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Test Results

## Testing when nginx-gateway container restarts
## Restart nginx-gateway container
Passes test with no errors.

## Testing when nginx container restarts
## Restart NGINX container
Passes test with no errors.

## Testing when the NGF Pod restarts through node shutdown with cleaning up of resources
## Restart Node with draining
Passes test with no errors.

## Testing when the NGF Pod restarts through node shutdown without cleaning up of resources
## Restart Node without draining
Does not work correctly the majority of times and errors after running `docker restart kind-control-plane`.
NGF Pod is not able to recover as the nginx container logs show this error:
NGF Pod is not able to recover as the NGINX container logs show this error:
`bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)`.