Devops malarkey

Success. Failure. Cake.

Not Invented Here (Kubernetes Remix)

For the last yea-many months, I’ve been hacking on actual Kubernetes. It has been/is being really very interesting, and that’s only partly the precursor to an ‘interesting times’ riff. I have learned many things.

  • Anyone who says you don’t need any ops experience/people is lying through their teeth. The doc about how to make yr site ‘production ready’ advises you to carve out your own netblock(s) from within a lump of 10/8, allocate subnets for various systems, string them together with load balancers, consider capacity plans and redundancy, logging, monitoring, backups, security and access control. None of these things are beyond the wit of most people, but they’re more likely to be back-of-envelope stuff for yr average jobbing Unix admin than most other technical types.

  • Deployment tools/strategies seem, thus far at least, to consider it a job well done if they start/bounce/replace a single container. Which, well, is really quite a lot of a Noddy job.

  • There is a thing called Helm, which is jolly handy. It’s a package-manager for Kubernetes. If you’ve come from an ops background, then you’ll be familiar with the idea that a running (web)app is only half of the job, and generating something versioned and deployable is a much more useful target. Yes I know containers are versioned and deployable, but see how far you get with that attitude with K8S as a target. Just a simple matter of YAML, is it? Fine. I’ll wait while you grovel through a pile of similar-looking files, looking for the right version tag.

The first thing can be solved (FAVO ‘solved’) with Terrorform. The only nice thing I can say about Terrorform is that it works, mostly as advertised. A quite startling feature of the thing is that if you want to change something about a K8S cluster on GCloud with Terrorform, it will blow the cluster away and build you a new one. Now I guess this is probably less of a surprise if you’ve a set of redundant clusters in multiple zones, but it’s still not the sort of behaviour that one might hope for.

After that, Helm is a useful tool.

However, The Product is a collection of helm charts flying in close formation, and rightly so since generating a monolith is a daft idea and being able to tinker with subsystems in isolation is a useful thing. So I found myself typing ‘helm install thing, helm install other-thing, helm upgrade this-thing’ quite a lot, which became trivially fat-fingerable. As I discovered when a routine upgrade toasted a beta cluster.

I poked around at a couple of the available things which alleged they could make this sort of malarkey go away, and they were either complete workflows which assume their way is best (Ho ho ho. No.) or just seemed to be complicated YAML generators. I am really very much not interested in generating very similar piles of inpenetrable YAML that I have to keep in version control. The impenetrable YAML should only ever exist on the target cluster and not really be the thing that is fiddled with by humans, unless they like stabbing themselves in the leg with a fork or something.

So I lashed up a thing that ran helm commands in the right order for me, made sure that the cluster I thought I was looking at was the actual target, did some elementary access config and made sure the storage classes I expected were where they should be. Then I could put much fewer things into the git repository that defined a given site/cluster and repeat myself as seldom as possible.

Here’s a really simple example that installs Prometheus, Grafana, a cert-manager and an Ingress. Obviously you’d not do this in an environment where you wanted Prometheus to work, since there’s nothing for it to monitor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---

context: (your kubectl context goes here. D'you think I'd leave one of ours to look at?)
                                                                                                                                           
secrets:                                                                                                                                   
  - dockerhub                                                                                                                              
                                                                                                                                           
logfile: notgrove.log                                                                                                                      
debug: false                                                                                                                               
                                                                                                                                           
charts:                                                                                                                                    
  naunton:                                                                                                                                 
    values: helm-values/nginx.yaml                                                                                                         
    chart:  stable/nginx-ingress                                                                                                           
  shuthonger:                                                                                                                              
    values: helm-values/kube-lego.yaml                                                                                                     
    chart:  stable/kube-lego                                                                                                               
  turkdean:                                                                                                                                
    values: helm-values/prometheus.yaml                                                                                                    
    chart:  stable/prometheus                                                                                                              
  queenhill:                                                                                                                               
    values: helm-values/grafana.yaml                                                                                                       
    chart: stable/grafana        

Because I come from Puppet-land, that config file gets called a manifest.yaml.

  • context is the name of the target cluster. The tool expects to have the right permissions to access same.
  • secrets are, er, a list of k8s secrets you may need to have configured before Helm will work.
  • logfile should be obvious.
  • debug ditto.
  • charts is a list of Helm charts, their local config values that are different from the defaults and where to find them.

That last bit implies there’s some directory structure. Here it is:

1
2
3
4
5
6
7
drwxrwxr-x  5 jhr jhr 4096 Jul  9 13:22 .
drwxrwxr-x 22 jhr jhr 4096 Oct  2 10:39 ..
drwxrwxr-x  2 jhr jhr 4096 Jul  9 13:22 access
drwxrwxr-x  2 jhr jhr 4096 Jul  9 13:22 cluster-setup
drwxrwxr-x  2 jhr jhr 4096 Jul  9 13:22 helm-values
-rw-rw-r--  1 jhr jhr  467 Jul  9 13:22 manifest.yaml
-rw-rw-r--  1 jhr jhr 1514 Jul  9 13:22 README.md
  • access contains cluster config to do with roles and/or pod security.
  • cluster-setup contains things like storage class config.
  • helm-values are (somewhat unsurprisingly) where the values.yaml files for the Helm charts live.

As you might guess, the tool processes the contents of those directories in order, so you can be fairly sure that your storage classes will exist before something in a Helm chart tries to allocate a PersistentVolumeClaim that you’d prefer was SSD instead of default spinny rust.

And that implies there’s a tool with some options:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
km [OPTION] ... (install|DELETE|upgrade)

-h, --help:
  show help

-f, --file [name]:
  filename of manifest. Usually manifest.yaml

-n, --namespace [namespace]:
  k8s namespace to create and use for this installation.

-j, --jfdi
  Don't stop and ask for confirmation that you're about to do something terrible.

install, upgrade or delete. Actions. Yes.

That’s quite enough typing for one evening. I should imagine that I’ll return to this to explain it better, show what it looks like running and point at the relevant github repos.