Skip to content

Commit 9c02f3e

Browse files
committed
syncing read the docs to the latest from kubernetes AWS and local integration testing with Lets Encrypt using the cert manager and rolling custom x509 TLS SSL keys, certs, CA, and csrs using ansible and loading them as kubernetes secrets on deployment for use with the 4 nginx-ingress endpoints: api.example.com, jupyter.example.com, pgadmin.example.com, and splunk.example.com
1 parent 7b98257 commit 9c02f3e

File tree

1 file changed

+194
-4
lines changed

1 file changed

+194
-4
lines changed

webapp/drf_network_pipeline/docs/source/deploy-antinex-on-kubernetes.rst

Lines changed: 194 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
Deploy a Distributed Stack to Kubernetes
2-
----------------------------------------
1+
Deploying a Distributed AI Stack to Kubernetes on Ubuntu
2+
--------------------------------------------------------
33

44
.. image:: https://i.imgur.com/qiyhAq9.png
55

66
Install and manage a Kubernetes cluster with helm on a single Ubuntu host. Once running, you can deploy a distributed, scalable python stack capable of delivering a resilient REST service with JWT for authentication and Swagger for development. This service uses a decoupled REST API with two distinct worker backends for routing simple database read and write tasks vs long-running tasks that can use a Redis cache and do not need a persistent database connection. This is handy for not only simple CRUD applications and use cases, but also serving a secure multi-tenant environment where multiple users manage long-running tasks like training deep neural networks that are capable of making near-realtime predictions.
77

88
This guide was built for deploying the `AntiNex stack of docker containers <https://github.com/jay-johnson/train-ai-with-django-swagger-jwt>`__ on a Kubernetes cluster:
99

10+
- `Cert Manager with Let's Encrypt SSL support <https://github.com/jetstack/cert-manager>`__
1011
- `Redis <https://hub.docker.com/r/bitnami/redis/>`__
1112
- `Postgres <https://github.com/CrunchyData/crunchy-containers>`__
1213
- `Django REST API with JWT and Swagger <https://github.com/jay-johnson/deploy-to-kubernetes/blob/master/api/deployment.yml>`__
@@ -155,6 +156,12 @@ If you want to deploy splunk you can add it as an argument:
155156

156157
./deploy-resources.sh splunk
157158

159+
If you want to deploy splunk with Let's Encrypt make sure to add ``prod`` as an argument:
160+
161+
::
162+
163+
./deploy-resources.sh splunk prod
164+
158165
Start Applications
159166
------------------
160167

@@ -176,6 +183,14 @@ If you want to deploy the splunk-ready application builds, you can add it as an
176183

177184
./start.sh splunk
178185

186+
If you want to deploy the splunk-ready application builds integrated with Let's Encrypt TLS encryption, just add ``prod`` as an argument:
187+
188+
::
189+
190+
./start.sh splunk prod
191+
192+
.. note:: The `Cert Manager <https://github.com/jetstack/cert-manager>`__ is set to staging mode by default and requires the ``prod`` argument to prevent accidentally getting blocked due to Lets Encrypt rate limits
193+
179194
Confirm Pods are Running
180195
========================
181196

@@ -203,7 +218,7 @@ To apply new Django database migrations, run the following command:
203218
Add Ingress Locations to /etc/hosts
204219
-----------------------------------
205220

206-
When running locally, all ingress urls need to resolve on the network. Please append the following entries to your local ``/etc/hosts`` file on the ``127.0.0.1`` line:
221+
When running locally (also known in these docs as ``dev`` mode), all ingress urls need to resolve on the network. Please append the following entries to your local ``/etc/hosts`` file on the ``127.0.0.1`` line:
207222

208223
::
209224

@@ -826,6 +841,12 @@ To deploy splunk you can add the argument ``splunk`` to the `./deploy-resources.
826841

827842
./splunk/run.sh
828843

844+
Or if you want to use Let's Encrypt for SSL:
845+
846+
::
847+
848+
./splunk/run.sh prod
849+
829850
Deploy Splunk-Ready Applications
830851
--------------------------------
831852

@@ -856,6 +877,141 @@ View Ingress Config
856877

857878
./splunk/view-ingress-config.sh
858879

880+
Create your own self-signed x509 TLS Keys, Certs and Certificate Authority with Ansible
881+
---------------------------------------------------------------------------------------
882+
883+
If you have openssl installed you can use this ansible playbook to create your own certificate authority (CA), keys and certs.
884+
885+
#. Create the CA, Keys and Certificates
886+
887+
::
888+
889+
cd ansible
890+
ansible-playbook -i inventory_dev create-x509s.yml
891+
892+
#. Check the CA, x509, keys and certificates for the client and server were created
893+
894+
::
895+
896+
ls -l ./ssl
897+
898+
Deploying Your Own x509 TLS Encryption files as Kubernetes Secrets
899+
------------------------------------------------------------------
900+
901+
This is a work in progress, but in ``dev`` mode the cert-manager is not in use. Instead the cluster utilizes pre-generated x509s TLS SSL files created with the `included ansible playbook create-x509s.yml <https://github.com/jay-johnson/deploy-to-kubernetes/blob/master/ansible/create-x509s.yml>`__. Once created, you can deploy them as Kubernetes secrets using the `deploy-secrets.sh <https://github.com/jay-johnson/deploy-to-kubernetes/blob/master/ansible/deploy-secrets.sh>`__ script and reload them at any time in the future.
902+
903+
Deploy Secrets
904+
==============
905+
906+
Run this to create the TLS secrets:
907+
908+
::
909+
910+
./ansible/deploy-secrets.sh
911+
912+
List Secrets
913+
============
914+
915+
::
916+
917+
kubectl get secrets | grep tls
918+
tls-client kubernetes.io/tls 2 15s
919+
tls-database kubernetes.io/tls 2 15s
920+
tls-docker kubernetes.io/tls 2 15s
921+
tls-jenkins kubernetes.io/tls 2 14s
922+
tls-jupyter kubernetes.io/tls 2 14s
923+
tls-k8 kubernetes.io/tls 2 13s
924+
tls-kafka kubernetes.io/tls 2 13s
925+
tls-kibana kubernetes.io/tls 2 13s
926+
tls-nginx kubernetes.io/tls 2 12s
927+
tls-pgadmin kubernetes.io/tls 2 12s
928+
tls-phpmyadmin kubernetes.io/tls 2 12s
929+
tls-rabbitmq kubernetes.io/tls 2 11s
930+
tls-redis kubernetes.io/tls 2 11s
931+
tls-restapi kubernetes.io/tls 2 11s
932+
tls-splunk kubernetes.io/tls 2 10s
933+
tls-webserver kubernetes.io/tls 2 10s
934+
935+
Reload Secrets
936+
==============
937+
938+
If you want to deploy new TLS secrets at any time, use the ``reload`` argument with the ``deploy-secrets.sh`` script. Doing so will delete the original secrets and recreate all of them using the new TLS values:
939+
940+
::
941+
942+
./ansible/deploy-secrets.sh -r
943+
944+
Deploy Cert Manager with Let's Encrypt
945+
--------------------------------------
946+
947+
Use these commands to manage the `Cert Manager with Let's Encrypt SSL support <https://github.com/jetstack/cert-manager>`__ within Kubernetes. By default, the cert manager is deployed only in ``prod`` mode. If you run it in production mode, then it will install real, valid x509 certificates from `Let's Encrypt <https://letsencrypt.org/>`__ into the nginx-ingress automatically.
948+
949+
Start with Let's Encrypt x509 SSL Certificates
950+
==============================================
951+
952+
Start the cert manager in ``prod`` mode to enable Let's Encrypt TLS Encryption with the command:
953+
954+
::
955+
956+
./start.sh prod
957+
958+
Or manually with the command:
959+
960+
::
961+
962+
./cert-manager/run.sh prod
963+
964+
If you have splunk you can just add it to the arguments:
965+
966+
::
967+
968+
./start.sh splunk prod
969+
970+
View Logs
971+
=========
972+
973+
When using the production mode, make sure to view the logs to ensure you are not being blocked due to rate limiting:
974+
975+
::
976+
977+
./cert-manager/logs.sh
978+
979+
Stop the Cert Manager
980+
---------------------
981+
982+
If you notice things are not working correctly, you can quickly prevent yourself from getting blocked by stopping the cert manager with the command:
983+
984+
::
985+
986+
./cert-manager/_uninstall.sh
987+
988+
.. note:: If you get blocked due to rate-limits it will show up in the cert-manager logs like:
989+
990+
::
991+
992+
I0731 07:53:43.313709 1 sync.go:273] Error issuing certificate for default/api.antinex.com-tls: error getting certificate from acme server: acme: urn:ietf:params:acme:error:rateLimited: Error finalizing order :: too many certificates already issued for exact set of domains: api.antinex.com: see https://letsencrypt.org/docs/rate-limits/
993+
E0731 07:53:43.313738 1 sync.go:182] [default/api.antinex.com-tls] Error getting certificate 'api.antinex.com-tls': secret "api.antinex.com-tls" not found
994+
995+
Debugging
996+
=========
997+
998+
To reduce debugging issues, the cert manager ClusterIssuer objects use the same name for staging and production mode. This is nice beacuse you do not have to update all the annotations to deploy on production vs staging:
999+
1000+
The cert manager starts and defines the issuer name for both production and staging as:
1001+
1002+
::
1003+
1004+
--set ingressShim.defaultIssuerName=letsencrypt-issuer
1005+
1006+
Make sure to set any nginx ingress annotations that need Let's Encrypt SSL encryption to these values:
1007+
1008+
::
1009+
1010+
annotations:
1011+
kubernetes.io/tls-acme: "true"
1012+
kubernetes.io/ingress.class: "nginx"
1013+
certmanager.k8s.io/cluster-issuer: "letsencrypt-issuer"
1014+
8591015
Troubleshooting
8601016
---------------
8611017

@@ -896,10 +1052,44 @@ Or use the file:
8961052
sudo su
8971053
./tools/cluster-reset.sh
8981054

1055+
Or the full reset and deploy once ready:
1056+
1057+
::
1058+
1059+
sudo su
1060+
cert_env=dev; ./tools/reset-flannel-cni-networks.sh; ./tools/cluster-reset.sh ; ./user-install-kubeconfig.sh ; sleep 30; ./deploy-resources.sh splunk ${cert_env}
1061+
exit
1062+
# as your user
1063+
./user-install-kubeconfig.sh
1064+
# depending on testing vs prod:
1065+
# ./start.sh splunk
1066+
# ./start.sh splunk prod
1067+
1068+
Development
1069+
-----------
1070+
1071+
Right now, the python virtual environment is only used to bring in ansible for running playbooks, but it will be used in the future with the kubernetes python client as I start using it more and more.
1072+
1073+
::
1074+
1075+
virtualenv -p python3 /opt/venv && source /opt/venv/bin/activate && pip install -e .
1076+
1077+
Testing
1078+
-------
1079+
1080+
::
1081+
1082+
py.test
1083+
1084+
or
1085+
1086+
::
1087+
1088+
python setup.py test
1089+
8991090
License
9001091
-------
9011092

9021093
Apache 2.0 - Please refer to the LICENSE_ for more details
9031094

9041095
.. _License: https://github.com/jay-johnson/deploy-to-kubernetes/blob/master/LICENSE
905-

0 commit comments

Comments
 (0)