Dreaming with Kubernetes in Kubernetes (K-in-K)

Have you ever felt like you were dreaming inside a dream? Seems like a good way to get extra sleep cycles? Or more to the point of this blog, have you ever wanted to quickly spin up a Kubernetes cluster (or lots of them) on another Kubernetes cluster? Following are few use cases of what I will call Kubernetes-in-Kubernetes (K-in-K).
- Try out a new version of Kubernetes on an existing Kubernetes cluster
- Create sandboxes for team members to try, test some bug fixes for Kubernetes
- Run a Kubernetes application on a dedicated Kubernetes cluster without having to spend a lot of time and resources provisioning a complete Kubernetes cluster?
Once you start thinking about it, there are quite a few uses for running a Kubernetes cluster on another Kubernetes cluster.
Fortunately for all of us, it is not too difficult to run Kubernetes on Kubernetes. There are quite a few GitHub repositories here and here that show you how to run a multi-node Kubernetes cluster, where each node is simply a Docker container running on your laptop. Since Kubernetes runs on Docker containers, it seems natural to try the next step of running a multi-node Kubernetes cluster on a Kubernetes cluster, where each node is a Pod. There is an existing project that already does that here. In my effort to make it easier to understand, I have simplified the Dockerfiles and the required scripts. I have published all the code for my work here and it has been heavily influenced by the Git repositories linked above.
In this blog post, I will walk you through the details of what I did and why I did it. I have used Argo, an amazing open-source project, to deploy Kubernetes on Kubernetes (full disclosure: I am on the team that built Argo 😊). Argo is a workflow engine that runs on Kubernetes. Each step of this workflow is a Pod and Argo supports running deployments and sequencing them. When you run this workflow on Argo, the workflow looks like this:

The first two steps are executed in parallel and each of those steps generates some configuration for the Kubernetes master. The output of these first two steps are consumed by the subsequent step which starts the Kubernetes master inside a Pod. Argo’s workflow engine supports extracting files from one container and copying them to another container as artifacts. Here is a short snippet from the workflow that shows that. The GENERATE_TOKEN
step outputs a file token
that is copied out as an artifact. It is then copied in to the MASTER_DEPLOYMENT
step. The MASTER_DEPLOYMENT
step creates the deployment from the YAML spec k8s-master-deployment
and you can see the input artifact is then mounted inside the deployment container at the location specified by the path
field.
Now, lets take a look at the DockerFile
for the master and minion deployments shown below. I start with a simple ubuntu:xenial
base Docker image and install docker-ce
and supervisord
. Docker is installed to run the docker daemon inside this container (Docker-in-Docker) and supervisord is a simpler systemd replacement. Supervisord is simpler to work with and it pulls in fewer dependencies. The next step is to download and extract Kubernetes server binaries. Finally, I copied in a bunch of scripts and supervisord configuration. That is it! A simple Dockerfile for a container that runs as a Kubernetes Master.
The supervisord configuration is quite simple. I have extracted key lines from the full config that require some explanation and posted it below. As you can see there are 3 programs
that are started by supervisord. Supervisord itself runs in non-daemon mode, as required for docker containers. When kubelet is started, I pass the path to kubelet.conf
and mark this file as required with the --require-kubeconfig=true
option. Initially, the file is not present and kubelet will die. Supervisord will quickly respawn kubelet and will mark the program as FATAL
if it fails after 5 retries. Eventually, kubeadm will run as part of start_master
program and generate this file. During this time, I needed kubelet to keep respawning and was able to achieve that with a large number for startretries
. There are more elegant ways to do this, but this works!
Next, we will look at the start_master
script. This script performs the following steps:
- Generate certificates:
kubeadm
will generate certificates for apiserver and other components in no certs are found in/etc/kubernetes/pki
but it will use certs if they exist in the path. We generated certs here as we wanted to make sure that the apiserver hasSANs
that allow the inner kubernetes master to be accessible from anywhere in the outer kubernetes cluster. We do this by adding aDNS SAN
for the Kubernetes service object with the namespace. In Argo deployments, you specify an Application name that maps to a Kubernetes namespace. Argo also create service objects inside this namespace forinternal_routes
defined in Argo DSL. For this service object (saytest-master
ininception
namespace) to be reachable from anywhere, we need to addtest-master.inception
as a DNS SAN in the certificate. This is done in thegenerate_certificates
function. - Kubeadm init:
kubeadm init
command is passed some options for init to work inside a container and for CNI networking to work properly. The option--pod-network-cidr
option specifies the/16
cidr range for Kubernetes Pods. This range needs to be separate from the outer Kubernetes’ range. In my example, outer Kubernetes uses192.168.0.0/16
as the Pod CIDR. So, in the inner Kubernetes, I chose10.100.0.0/16
. Kubeadm requires a token and we pass the token generated in step 1 to the--token
option. - Kube-proxy fixup: kube-proxy is a
DaemonSet
that is started by kubeadm which manages Pod Networking. For it to run inside a container, it needs to be passed some options so that it can work properly. The smart folks for the github repositories referenced above figured this out and I shamelessly copied that.--masquerade-all --conntrack-max=0 --conntrack-max-per-core=0
options need to be added. Also, you need to kill the kube-proxy pod for these options to take effect as the DaemonSet update policy isOnDelete
. - CNI provider setup: Finally, I download
calico
config file, edit the file to ensure that the Calico IP Pool matches the--pod-network-cidr
and apply this file. For some strange reason, I need to delete the kube-dns pod and restart it otherwise it has trouble talking to the apiserver on the default serivce range’s first IP (10.96.0.1
). After restart, it works fine.
And, that is it! You now have a kubernetes master node that is running as a Pod.
Starting a Minion Pod
The same script can be used to start a minion, but it does fewer things. kubeadm join
request is sent using the generated token. Since this is a standard kubernetes deployment you can scale this deployment and new minion nodes (as Pods) will be added. I also have a step in the Argo workflow that extracts the kubeconfig
file so that any other step in the workflow can use it to connect to the inner Kubernetes. In the workflow, I have a UNIT_TEST
step that uses this file to create an Nginx deployment and expose it as a NodePort. Then it curls the master Pod IP with the nodeport and checks if the connection is working.
Summary
Creating a Kubernetes cluster from Pods is quite easy and requires simple Docker images with simple scripts and a few tricks. Argo makes it easy to orchestrate these deployments and test it. Together, you can go deeper and deeper into Kubernetes. Just make sure you have your Totem!
Abhinav Das is a member of the technical staff at Applatix, a startup committed to helping users realize the value of containers and Kubernetes in their day-to-day work.