This week’s guest blog is authored by Lee, one of our in-house DevOps engineers who works on developing our cloud products and streamlining our deployment processes.
At UKFast, we utilise Kubernetes for a variety of applications such as websites and public-facing APIs. Historically, these applications have been deployed within virtual machines (VMs) and scaled manually with the use of a traditional load balancer. But with our applications now deployed within Kubernetes, we take advantage of the Kubernetes HPA (horizontal pod auto scaler). The HPA automatically scales pods based upon pre-configured metric thresholds such as CPU utilisation.
Below, we’ll walk through configuring and testing a basic HPA configuration, using pre-defined CPU utilisation thresholds. It is assumed that you have a basic understanding of Kubernetes and CLI tools.
First, we’ll deploy an application to test the functionality of HPA. For this example, we’re using the image chiahan1123/docker-hpa-example
, which artificially generates CPU load by the use of expensive arithmetic.
We’ll be deploying this application into a pre-existing Kubernetes cluster, with all resources placed in the namespace loadtestdemo
. Let’s go ahead and apply a set of configurations for our application, which includes a Deployment, Service
and Ingress
:
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: testapp
namespace: loadtestdemo
spec:
replicas: 1
selector:
matchLabels:
app: testapp
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: testapp
spec:
containers:
- image: chiahan1123/docker-hpa-example
imagePullPolicy: Always
name: testapp
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 200m
requests:
cpu: 100m
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: testapp-service
namespace: loadtestdemo
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: testapp
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: testapp-ingress
spec:
rules:
- host: testapp.ukfast.co.uk
http:
paths:
- path: /
backend:
serviceName: testapp-service
servicePort: 80
EOF
To verify our deployment, we can retrieve running pods to ensure there is one running:
kubectl get pod
NAME READY STATUS RESTARTS AGE
testapp-7dbd668469-b2trj 1/1 Running 0 25s
Next, we’ll create our HPA. For this example, we’ll set the CPU percent threshold to 50 (targetCPUUtilizationPercentage
), with a maximum of 10 replicas (maxReplicas
):
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: testapp
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: testapp
targetCPUUtilizationPercentage: 50
EOF
We can retrieve our new HPA with kubectl:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
testapp Deployment/testapp 0%/50% 1 10 1 10s
Our test application is now deployed and configured for auto scaling, which we will demonstrate below.
Now that our test application has been deployed and HPA configured, we will proceed to test the functionality of the configured HPA as to ensure it is functioning as expected. For this demonstration, the following tools are required:
First, we’ll retrieve the ID of the domain we’re testing:
ukfast loadtest domain list
+--------------------------------------+----------------------+
| ID | NAME |
+--------------------------------------+----------------------+
| d1d1ee4c-c6bc-464f-ab5f-1d47c3507f70 | testapp.ukfast.co.uk |
+--------------------------------------+----------------------+
Next, we’ll retrieve the available scenarios for the load test:
ukfast loadtest scenario list
+--------------------------------------+---------+--------------------------------+
| ID | NAME | DESCRIPTION |
+--------------------------------------+---------+--------------------------------+
| 7d541984-d198-4815-aa9a-6cd2894c0090 | Smash | Increases to maximum users |
| | | by 50% of the run time, then |
| | | maintain until the end |
| cb6427fb-b98d-481d-a173-eaa6d15d6703 | Step up | Increase the users in 5 equal |
| | | steps over a period of time |
| ce1873a4-048e-44d9-92b4-aab62fd41a42 | Incline | Gradually ramp the users up to |
| | | 100% over a period of time |
+--------------------------------------+---------+--------------------------------+
For this test, we’ll use the the Smash scenario with 1000 users for a duration of 5m:
ukfast loadtest test create --domain-id d1d1ee4c-c6bc-464f-ab5f-1d47c3507f70 --name
testapp-loadtest --protocol https --scenario-id 7d541984-d198-4815-aa9a-6cd2894c0090 --
authorisation-name Lee --authorisation-company UKFast --authorisation-position Developer
--authorisation-agreement-version v1.0 --duration 5m --number-of-users 1000
+--------------------------+----------------+----------+------+-----------------+----------+
| ID | NAME | PROTOCOL | PATH | NUMBER OF USERS | DURATION |
+--------------------------+----------------+----------+------+-----------------+----------+
| 18fb4101-45b7-4b22-868c-b7292bb3f388 | testapp-loadtest | https | / | 1000 | 00:05:00 |
+--------------------------------------+------------------+-------+----+--------+----------+
We’re now ready to run the test that we’ve just created, as below:
ukfast loadtest job create --test-id 18fb4101-45b7-4b22-868c-b7292bb3f388 --run-now
+--------------------------------------+---------------------+-------------------+---------+
| ID | JOB START TIMESTAMP | JOB END TIMESTAMP | STATUS |
+--------------------------------------+---------------------+-------------------+---------+
| 3e6b357b-80b3-4f30-857d-36dba5509c6d | | | Pending |
+--------------------------------------+---------------------+-------------------+---------+
Whilst the test is running, we can verify the functionality of our HPA by monitoring the replica count using kubectl:
watch -n1 kubectl get pod
We can also monitor the results of the load test via the UKFast CLI:
ukfast loadtest job results show 3e6b357b-80b3-4f30-857d-36dba5509c6d --graph-virtualusers
--graph-latency
As can be seen from the load test results above, the latency of our application remains stable for the majority of the test, with the asterisk * marking where our HPA started to scale the deployment replicas:
kubectl get pod
NAME READY STATUS RESTARTS AGE
testapp-7dbd668469-6gw5l 1/1 Running 0 5m3s
testapp-7dbd668469-bknf6 0/1 ContainerCreating 0 1s
testapp-7dbd668469-h9xxc 0/1 ContainerCreating 0 1s
We can also verify the scaling of our deployment by observing events:
kubectl get events
10m Normal ScalingReplicaSet deployment/testapp
Scaled up replica set testapp-7dbd668469 to 3
Our HPA will automatically scale down our deployment replicas in line with CPU metrics for our given threshold:
kubectl get events
62s Normal ScalingReplicaSet deployment/testapp
Scaled down replica set testapp-7dbd668469 to 1
That concludes the deployment of a basic HPA into Kubernetes, which we have tested using the a load testing tool. Kubernetes HPAs are highly customisable and can be tailored to most application scaling requirements. Custom and external metrics can be used to further extend HPAs, allowing developers to provide their own source of metrics data should these be required.
Interested in learning more about containers or Kubernetes HPA? Why not take a look at UKFast’s Dedicated Container Platform.