Ansible-based installer for Smart Scaler components and Kubernetes cluster deployment.
- Prerequisites for Deploying K8s Cluster (~2–3 mins)
- Installation Steps for Deploying K8s Cluster (~15–20 mins)
- Prerequisites for Installing SmartScaler Apps (~2 mins)
- Instructions to Deploy SmartScaler Apps (Depends on NIM profiles 70b(~20-25 mins), 8b(~10-15 mins), 1b(~10 mins))
- Example Test Run Steps (~15 mins)
- Execution Order Control (optional) (~1 min)
- Destroying the Kubernetes Cluster (~5 mins)
- Documentation Links
- Troubleshooting
- CPU: 8 cores minimum
- RAM: 16GB minimum
- Storage: 500GB minimum (Depends on NIM Profile Requirements for loading Image/Nim Cache PVC Requirements)
- OS: Ubuntu 22.04+ or compatible Linux distribution
- CPU: 8 cores minimum
- RAM: 16GB minimum
- Storage: 500GB minimum (Depends on NIM Profile Requirements for loading Image/Nim Cache PVC Requirements)
- OS: Same as control plane nodes
- Python 3.x and pip
- Git
- SSH key generation capability
- helm v3.15.0+
- kubectl v1.25.0+
- SSH access between installer machine and all cluster nodes
- Internet connectivity for downloading packages
- Open ports: 6443 (API server), 2379-2380 (etcd), 10250 (kubelet)
# Clone the repository
git clone https://github.com/smart-scaler/smartscaler-apps-installer.git
cd smartscaler-apps-installer
# Install Python3
sudo apt update
sudo apt-get install python3-venv python3-full -y
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
chmod +x files/install-requirements.sh
./files/install-requirements.sh
# Install Ansible collections
LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 ansible-galaxy collection install -r requirements.yml --force# Generate SSH key for cluster access
ssh-keygen -t rsa -b 4096 -f ~/.ssh/k8s_rsa -N ""
# Copy SSH key to each node (repeat for all nodes)
ssh-copy-id -i ~/.ssh/k8s_rsa.pub user@node-ipEdit user_input.yml with your cluster configuration:
This section defines the settings required to enable and configure a Kubernetes cluster deployment using Ansible.
kubernetes_deployment:
enabled: true # Enable Kubernetes deployment via Ansible
api_server:
host: "PUBLIC_IP" # Public IP of Kubernetes API server
port: 6443 # Default secure port
secure: true # Use HTTPS (recommended)
ssh_key_path: "/absolute/path/to/.ssh/k8s_rsa" # SSH private key path
default_ansible_user: "REPLACE_SSH_USER" # SSH user (e.g., ubuntu, ec2-user)
ansible_sudo_pass: "" # Optional: sudo password
control_plane_nodes:
- name: "master-1"
ansible_host: "PUBLIC_IP" # Public IP for SSH
ansible_user: "REPLACE_SSH_USER"
ansible_become: true
ansible_become_method: "sudo"
ansible_become_user: "root"
private_ip: "PRIVATE_IP" # Internal/private IP
You can quickly update your user_input.yml by replacing only the values in this command based on your environment.
Keep the placeholder keywords (PUBLIC_IP, PRIVATE_IP, etc.) on the left side exactly as-is.
⚠️ Warning: Replace only the values on the right-hand side (192.168.1.100,root, etc.) with your actual environment details. Do not modify the placeholder keywords (PUBLIC_IP,PRIVATE_IP, etc.) — they are required for matching.
sed -i \
-e 's|PUBLIC_IP|172.235.157.18|g' \
-e 's|PRIVATE_IP|172.235.157.18|g' \
-e 's|REPLACE_SSH_USER|root|g' \
-e 's|/absolute/path/to/.ssh/k8s_rsa|/root/.ssh/k8s_rsa|g' \
-e '/kubernetes_deployment:/,/^[[:space:]]*[^[:space:]]*enabled:/ s/enabled: false/enabled: true/' \
user_input.yml✅ This command will:
- Replace
PUBLIC_IPandPRIVATE_IPplaceholders with your node IP- Set the correct SSH user and key path
- Enable Kubernetes deployment by updating
enabled: false→enabled: true
If you're deploying on a single node and running the command from the same server, you can use the same IP address for both PUBLIC_IP and PRIVATE_IP.
# Make the script executable
chmod +x setup_kubernetes.sh
# Run the installation script with sudo
./setup_kubernetes.shsudo chown $(whoami):$(whoami) -R .
# Set the KUBECONFIG environment variable
export KUBECONFIG=output/kubeconfig
# Verify cluster access and node status
kubectl get nodes# Check cluster status
kubectl get nodes
kubectl cluster-info
# Verify all system pods are running
kubectl get pods --all-namespaces- Kubernetes cluster must be running and accessible
- kubectl configured with proper kubeconfig
- Helm v3.15.0+ installed
Set the following environment variables before deployment:
export NGC_API_KEY="your_ngc_api_key"
export NGC_DOCKER_API_KEY="your_ngc_docker_api_key"
export AVESHA_DOCKER_USERNAME="your_avesha_username"
export AVESHA_DOCKER_PASSWORD="your_avesha_password"Important: Set kubernetes_deployment.enabled to false in user_input.yml before running apps installation:
kubernetes_deployment:
enabled: false # Must be false for apps-only deployment
> ℹ️ **Required Kubeconfig Settings** – Already included above; this section can be skipped.
global_control_plane_ip: "YOUR_MASTER_PUBLIC_IP" # Provide the public IP for metallb/Nginx
global_kubeconfig: "output/kubeconfig" # Required: Path to kubeconfig file
global_kubecontext: "[email protected]" # Required: Kubernetes context
use_global_context: true # Required: Use global contextYou can quickly replace the placeholder values in your user_input.yml configuration using the following sed command:
sed -i \
-e '/kubernetes_deployment:/,/^[[:space:]]*[^[:space:]]*enabled:/ s/enabled: true/enabled: false/' \
user_input.yml# Verify cluster access
kubectl get nodes
kubectl cluster-info
# Verify required tools
kubectl version --client
helm version
# Verify environment variables
echo $NGC_API_KEY
echo $NGC_DOCKER_API_KEY
echo $AVESHA_DOCKER_USERNAME
echo $AVESHA_DOCKER_PASSWORD# Deploy with explicit credentials
ansible-playbook site.yml \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vvvv# Check all namespaces
kubectl get namespaces
# Expected namespaces:
# - gpu-operator
# - keda
# - monitoring
# - nim
# - nim-load-test
# - smart-scaler
# Verify component status
kubectl get pods -n gpu-operator
kubectl get pods -n monitoring
kubectl get pods -n keda
kubectl get pods -n nim
kubectl get pods -n smart-scaler
kubectl get pods -n nim-load-testExpected output:
## Infrastructure Components
# GPU Operator
gpu-operator-666bbffcd-drrwk 1/1 Running 0 96m
gpu-operator-node-feature-discovery-gc-7c7f68d5f4-dz7jk 1/1 Running 0 96m
gpu-operator-node-feature-discovery-master-58588c6967-8pjhc 1/1 Running 0 96m
gpu-operator-node-feature-discovery-worker-xkbk2 1/1 Running 0 96m
# Monitoring
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 98m
prometheus-grafana-67dc5c9fc9-5jzhh 3/3 Running 0 98m
prometheus-kube-prometheus-operator-775d58dc6b-bgglg 1/1 Running 0 98m
prometheus-kube-state-metrics-856b96f64d-7st5q 1/1 Running 0 98m
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 98m
prometheus-prometheus-node-exporter-nm8zl 1/1 Running 0 98m
pushgateway-65497548cc-6v7sv 1/1 Running 0 97m
# Keda
keda-admission-webhooks-7c6fc8d849-9cchf 1/1 Running 0 98m
keda-operator-6465596cb9-4j54h 1/1 Running 1 (98m ago) 98m
keda-operator-metrics-apiserver-dc4dd6d79-gzxpq 1/1 Running 0 98m
# AI/ML
meta-llama3-8b-instruct-pod 1/1 Running 0 97m
nim-k8s-nim-operator-7565b7477b-6d7rs 1/1 Running 0 98m
# Smart Scaler
smart-scaler-llm-inf-5f4bf754dd-6qbm9 1/1 Running 0 98m
# Load Testing Service
locust-load-54748fd47d-tndsr 1/1 Running 0 97mAfter deploying the application stack, Prometheus and Grafana can be accessed through the exposed NodePort services using your node’s IP address.
Run the following command to list the monitoring services:
kubectl get svc -n monitoringNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m21s
prometheus-grafana NodePort 10.233.59.186 <none> 80:32321/TCP 3m30s
prometheus-kube-prometheus-alertmanager ClusterIP 10.233.23.33 <none> 9093/TCP,8080/TCP 3m30s
prometheus-kube-prometheus-operator ClusterIP 10.233.49.28 <none> 443/TCP 3m30s
prometheus-kube-prometheus-prometheus NodePort 10.233.38.213 <none> 9090:30090/TCP,8080:32020/TCP 3m30s
prometheus-kube-state-metrics ClusterIP 10.233.40.63 <none> 8080/TCP 3m30s
prometheus-operated ClusterIP None <none> 9090/TCP 3m21s
prometheus-prometheus-node-exporter ClusterIP 10.233.55.211 <none> 9100/TCP 3m30s
pushgateway ClusterIP 10.233.42.8 <none> 9091/TCP 104s
Assuming your node IP is 192.168.100.10:
-
Grafana Dashboard 🔗 http://192.168.100.10:32321
-
Prometheus UI 🔗 http://192.168.100.10:30090
⚠️ Note:
- User and Password for Grafana UI is: admin/prom-operator
- NodePort values (like
32321for Grafana and30090for Prometheus) may change as per your environment. Always verify withkubectl get svc -n monitoring.- Ensure firewall rules or cloud security groups allow traffic to these NodePorts.
-
Import NIM Dashboard
Import the following NIM Dashboard JSON in Grafana https://github.com/smart-scaler/smartscaler-apps-installer/blob/main/files/grafana-dashboards/nim-dashboard.json
Note: Customize to your environment and model, if needed.
- User Input Configuration Guide - Complete user_input.yml guide
- User Input Reference - All configuration options
- Kubernetes Configuration - Cluster setup details
- Kubernetes Firewall Configuration - Network and firewall setup
- NVIDIA Container Runtime Configuration - GPU runtime setup
-
SSH Connection Failed
- Verify SSH keys are properly copied to all nodes
- Check SSH user permissions and sudo access
-
Cluster Deployment Failed
- Check system requirements are met
- Verify network connectivity between nodes
- Review firewall settings
-
Apps Deployment Failed
- Ensure
kubernetes_deployment.enabledis set tofalse - Verify all environment variables are set
- Check cluster accessibility with
kubectl get nodes
- Ensure
-
GPU Support Issues
- Verify NVIDIA drivers are installed on nodes
- Check
nvidia_runtime.enabledis set totrue - Review GPU operator pod status
# Check specific namespace issues
kubectl describe pods -n <namespace>
kubectl logs -n <namespace> <pod-name>
# Verify cluster resources
kubectl top nodes
kubectl get events --all-namespacesFor additional support, please refer to the detailed documentation in the docs/ folder or create an issue in the repository.
-
Complete the registration process at Avesha EGS Registration to receive the required access credentials
-
After successful registration, Avesha will process your license request and send the license YAML file to your registered email address.
-
Before applying the license, ensure that the kubeslice-controller namespace exists:
kubectl create namespace kubeslice-controller- Apply the license secret to your controller cluster:
kubectl apply -f egs-license.yaml- For detailed license setup instructions, refer to 📋 EGS License Setup.
The deployment process follows a specific execution order defined in user_input.yml. You can control which components to execute by modifying the execution order or using --extra-vars with Ansible.
metallb_chart- MetalLB load balancer installationmetallb_l2_config- L2 configuration for MetalLBmetallb_ip_pool- IP pool configuration for MetalLBnginx_ingress_config- NGINX ingress controller configurationnginx_ingress_chart- NGINX ingress controller installationcert_manager- Cert-manager for certificate management (required for AMD GPU operator)
gpu_operator_chart- NVIDIA GPU operator installationprometheus_stack- Prometheus monitoring stackpushgateway_manifest- Prometheus Pushgatewaykeda_chart- KEDA autoscalingnim_operator_chart- NIM operator installationcreate_ngc_secrets- NGC credentials setupverify_ngc_secrets- NGC credentials verificationcreate_avesha_secret- Avesha credentials setup
amd_gpu_operator_chart- AMD GPU operator for AMD Instinct GPU acceleratorsamd_gpu_deviceconfig_manifest- AMD GPU device configuration and settings
kubeslice_controller_egs- KubeSlice EGS controller for multi-cluster managementkubeslice_ui_egs- KubeSlice EGS management UI interfaceegs_project_manifest- EGS project configurationegs_cluster_registration_worker_1- Register worker clusterfetch_worker_secret_worker_1- Fetch worker authentication secretskubeslice_worker_egs_worker_1- Install EGS worker components
nim_cache_manifest_70b- NIM cache for 70B modelwait_for_nim_cache_70b- Wait for cache initializationnim_cache_wait_job_70b- Cache wait jobnim_service_manifest_70b- NIM service for 70B modelkeda_scaled_object_manifest_70b- KEDA scaling configurationcreate_inference_pod_configmap_70b- Inference configurationsmart_scaler_inference_70b- Smart Scaler setupcreate_locust_configmap_70b- Load test configurationlocust_manifest_70b- Load testing setupsmart_scaler_mcp_server_manifest- MCP server configuration
nim_cache_manifest_1b- NIM cache for 1B modelnim_service_manifest_1b- NIM service for 1B modelkeda_scaled_object_manifest_1b- KEDA scaling configurationcreate_inference_pod_configmap_1b- Inference configurationsmart_scaler_inference_1b- Smart Scaler setupcreate_locust_configmap_1b- Load test configurationlocust_manifest_1b- Load testing setup
nim_cache_manifest_8b- NIM cache for 8B modelnim_service_manifest_8b- NIM service for 8B modelkeda_scaled_object_manifest_8b- KEDA scaling configurationcreate_inference_pod_configmap_8b- Inference configurationsmart_scaler_inference_8b- Smart Scaler setupcreate_locust_configmap_8b- Load test configurationlocust_manifest_8b- Load testing setup
To execute specific components, use the execution_order variable with a list of components:
# Execute only GPU operator and monitoring stack
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['gpu_operator_chart','prometheus_stack']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute AMD GPU operator setup (alternative to NVIDIA)
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['cert_manager','amd_gpu_operator_chart','amd_gpu_deviceconfig_manifest']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute EGS installation
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['cert_manager','kubeslice_controller_egs','kubeslice_ui_egs','egs_project_manifest','egs_cluster_registration_worker_1','fetch_worker_secret_worker_1','kubeslice_worker_egs_worker_1']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute only NGINX ingress setup
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['nginx_ingress_config','nginx_ingress_chart']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv
# Execute all NIM 70B components
sudo ansible-playbook site.yml \
--extra-vars "execution_order=['nim_cache_manifest_70b','wait_for_nim_cache_70b','nim_cache_wait_job_70b','nim_service_manifest_70b','keda_scaled_object_manifest_70b','create_inference_pod_configmap_70b','smart_scaler_inference_70b','create_locust_configmap_70b','locust_manifest_70b']" \
-e "ngc_api_key=$NGC_API_KEY" \
-e "ngc_docker_api_key=$NGC_DOCKER_API_KEY" \
-e "avesha_docker_username=$AVESHA_DOCKER_USERNAME" \
-e "avesha_docker_password=$AVESHA_DOCKER_PASSWORD" \
-vv💡 Tip: Components are executed in the order they appear in the list. Make sure to list dependent components in the correct order and include all required credentials.
To completely remove the Kubernetes cluster and clean up all resources, run the following command from the root directory:
ansible-playbook kubespray/reset.yml -i inventory/kubespray/inventory.iniThis command will:
- Remove all Kubernetes components from the nodes
- Clean up all cluster-related configurations
- Reset the nodes to their pre-Kubernetes state
⚠️ Warning: This action is irreversible. Make sure to backup any important data before proceeding with the cluster destruction.
Each test run can include multiple cycles, with each cycle typically lasting around 1 hour. Running multiple cycles helps in evaluating consistency and observing Smart Scaler's behavior over time.
Follow these steps to (re)start a clean test cycle:
Scale the Locust deployment replicas to 0:
kubectl scale deployment locust-load-70b --replicas=0 -n nim-load-testScale the NIM LLM deployment replicas to 1:
kubectl scale deployment meta-llama3-70b-instruct --replicas=1 -n nimEnsure the HorizontalPodAutoscaler (HPA)replica is also set to 1:
kubectl get hpa -n nimWait for some time (5-20 minutes) to allow both Smart Scaler and HPA to fully scale down and stabilize at 1 replica.
kubectl get hpa -n nimEnsure the HorizontalPodAutoscaler (HPA)replica is also set to 1:
Note:
- verify and edit scaledobject, if needed (Typically you would need to edit this if you are switching from HPA to Smart Scaler)
Edit ScaledObject resource
kubectl edit scaledobjects llm-demo-keda-70b -n nimSet spec.metadata fields with the following data
- metadata:
metricName: smartscaler_hpa_num_pods
query: smartscaler_hpa_num_pods{ss_app_name="nim-llama",ss_deployment_name="meta-llama3-8b-instruct",job="pushgateway",ss_app_version="1.0", ss_cluster_name="nim-llama", ss_namespace="nim", ss_tenant_name="tenant-b200-local"}
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
threshold: "1"Check and reset the spec.maxReplicaCount to 8
Note:
- verify and edit scaledobject, if needed (Typically you would need to edit this if you are switching from Smart Scaler to HPA)
Edit ScaledObject resource
kubectl edit scaledobjects llm-demo-keda-70b -n nimSet spec.metadata fields with the following data
Note: threshold value will be different for different models and GPUs, based on the PSE values.
- For B200: llama3.1 70b, threshold:80
- For B200: llama3.1 8b, threshold:200
- metadata:
metricName: smartscaler_hpa_num_pods
query: sum(num_requests_running) + sum(num_requests_waiting)
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
threshold: "80"Check to make sure current replicas set to 1 and model pod is running and ready
kubectl get hpa -n nim
kubectl get pods -n nimScale the Locust replicas up to 1 to initiate the next test cycle:
kubectl scale deployment locust-load-70b -n nim-load-test --replicas=1Observe metrics and scaling behavior using the NIM Dashboard.