Devops malarkey

Success. Failure. Cake.

Small Parts Isolated and Deployed

It seems that one of the things that people new to Puppet (and sometimes by extension, automated CD/CI rigs) try to do is brickhammer their existing deployment chains into the thing. You can go look at the mailing list and about once a week, someone will go ‘I need Puppet to manage this thumping great source directory which we will distribute to $list-of-servers and then build in situ. How do I make Puppet do a ./configure && make && make install?’

To which the answer is ‘No.’ and the answer to that is stropping because $reasons.

If you or your organisation still want to do that sort of thing, my suggestion is that you bin the terrible Unix systems you’re using and try one of the many free (or indeed expensive) versions that come with 1990s features like a package-management system. Mind, if you’re using Gentoo for production systems then I can’t help you. Please stop reading there is nothing for you here.

Of course you can’t package up everything you might wish to bung on a server from a distance. There are also going to be rules-lawyers hunting out corner cases in order to prove me wrong. Which, I don’t know, seems to be the broken behaviour patterns of those who’re somehow proud of keeping some ancient and spavined code-management technique alive into the C21st. Don’t do that either, you’re just making you own life hard. Or you’re working for an organisation ditto and why are you doing that?

Our own rules are entirely arbitrary and look like this:

Rebuilt Debian packages and/or backports and/or wonky Ruby code that has a config file and an initscript are served as .debs from our own repo. Building your own Debian repository is desperately simple.

Website code is managed through the magic of git, or the nearly-magic of svn. Not via puppet. The site furniture is instantiated via some puppet, but deploys happen via MCollective. Sinatra-based webapps also fit here, even though they’re wonky Ruby code with config files and initscripts. We may fix this. Or not. Who can say?

Tomcat apps are emitted from the end of a Jenkins-based chain and largely manage themselves. Getting Puppet involved just seems to confuse things.

The new special case that prompted this ramble is a Java app that’s going to sit on some edge servers. The last thing that happens in that Jenkins chain is that the app is packaged up as a .deb. Ok, a Java-style .deb, so the file-layout would make a Debian packager shit themselves with hatred, but still. Since our package generation has been mostly ‘by hand’ up until now, I’d never bothered with hacking up the auto-upload bits of reprepro. For the Jenkins stuff to work properly, I had to fix that. Thus when there’s a new build of the Java app, it appears moments later (depending on cronjob) in our Debian repository.

At that point, I thought it would be a good thing to have the repository-uploader send a message to the event-logger so we could see that there was a new version of code and something should probably be done about it. Not long after that, I realised that the ‘something’ might as well be automated, too. So actually, the repository-uploader will emit a message to a relevant topic on our message-bus, which will trigger an ‘apt-get update’ on the servers where that app is installed. If we’re feeling brave and the Puppet code that manages the app has ‘ensure => latest’ in the package statement, then they’ll go on and install that newly updated version.

Which is kind of exactly the behaviour one would expect from a continuous deployment rig.

I Had That Janov Bloke in the Back of My Cab Once

Here’s a non-technical thing that’s been wandering round my head: brogrammers are more or less exactly what you can expect in an environment run by old-school Unix admins. Or rather, they emerged as a species in reaction to an environment which itself was a reaction.

I guess I’d better unpack that and provide some material so people can go TL;DR.

Brogrammers.

So (i). Brogrammer. You can go look that up on the internet, because that’s what sensible people do. If they come across a term or statement they’re not sure about, they can poke about the internet for a bit, gather information from several sources and perhaps come to a useful conclusion. It’s not, y’know, required, but it’s nice when it happens and makes them look much less like dicks than the sort of people who’ll just stand there going ‘No! Tell me what you mean!’

I would also ask you to go read this: what your culture really says, because it crystallised (or began the process of precipitation or whatever) a lot of what this ramble may or may not be about. I have no particular axe to grind with that piece because I am a white English bloke in my mid forties, and if I’ve been a participant in any of the scenarios listed I’ve not had the wit to realise it. It rings true, though. True enough that I suspect the ‘if’ in that preceding sentence is a ‘when’.

Finding Places to Put Things

I suspect this blog-thing will just contain sporadic apologies for lack of content for most of its lifetime.

Anyway.

This time the excuses have been brought to you by the words ‘fail’, ‘power’, ‘generator’, ‘contactor’, ‘250A supply’, ‘melted’ and the phrases ‘boot that filer from a different Vol0’, ‘can you smell smoke?’ and ‘Oh hell not again’.

As you might imagine, it’s been busy and the DR plan has been tested and found interesting.

We’re still Barberising and Hiera-ing up our shonky collection of Puppet modules. I’d say that they’re getting less shonky by the day, but it’s taking longer than that. I hesitate to talk about ‘patterns’, because… Actually, I think that’s an example of self-taught-hacker anti-intellectualism, which is an equal amount of rubbish as its opposite.

So. The Barberis(ing|ed) pattern is a fine thing and, when used in combination with the wonder that is Hiera, allows us to do more things in simpler code.

However. One of the modules that I’d been putting off refactoring (so ‘patterns’ are suspect but ‘refactoring’ is fine, eh?) was the one that manages our NSD install and thus the DNS for quite a number of domains, some of which contain rather popular websites.

NSD is the authoritative-only nameserver daemon written by NLNet Labs, who are a top bunch of chaps. We abandoned Bind after there were one too many vulnerability notices.

I’d been putting off the work because the v0.1 module just drops the entirety of the zone-files directory under ../files/ and lets Puppet do the work of synchronising the files across the nameservers. It’s not as if it’s a terrible thing to do at first glance - Puppet’s file-serving means you can stop faffing with hand-brewed rsync scripts for managing the out-of-band DNS data, and if you’ve got your Puppet tree in a sensible SCM, you get version control ‘for free’.

However (again), great lumps of org-specific data like that shouldn’t really, we are told, be held within the module tree. It’s not necessarily obvious where the data should go, though. Nor is it terribly obvious how you connect it back to the Puppet module and have changes in the one signal the other to perform tasks.

Well, it is if you look at the right corners of the Internet, but this thing is mostly me groping around and trying stuff out as a warning to others.

NSD installation and management goes in the now-Barberised NSD module.

This also deposits code that rebuilds the NSD config file when a domain is added or removed. And indeed the out-of-band master list of domains, which semi-obviously has to travel separately from the zonefiles for $reasons.

(It’s about this time that someone-who-is-not-me would be going ‘Why isn’t all this domain gubbins in a nice database somewhere, then all zone maintenance would be a simple ‘SELECT mumble FROM yinglebart WHERE tewkesbury ISNT something”, which would be very shortly before I hauled out the sarcasm throwing machine.)

The zonefiles live in a git repo of their own. That repo is cloned down onto the master DNS server(s) and kept current via the magic of post-commit hooks. Meanwhile, there’s a file resource in the NSD module which looks like this:

file { '/var/lib/nsd3/.git/HEAD':
  audit   => content,
  notify  => Exec['rebuild'],
}

exec { 'rebuild':
  command     => '/etc/nsd3/code/refresh.sh',
  refreshonly => true,
}

… Which is lifted wholesale from here. Either we’ve found one of the non-terrible use cases for this hack, or I’ll be writing another rambling post in a few months when I’ve had a better idea.

Actually Just Testing Something Else

You’d think, after all this time, that I could bosh things together and have them perform some semblance of useful work, right?

You’d think…

… Argh. ‘-’ characters in directory names? Surely not…

Leveraged Neatness Is Not Always Your Friend

I must begin by apologising for using the word ‘leveraged’. It’s terrible marketing-speak. However, it mostly fits in context and anyway I am using it in an ironic sense. (not that that’s even a valid excuse for the likes of me, but there you go.)

There are certain aspects of the admin trades where semi-obsessive neatness and/or attention to detail are useful things to have. It’s one of the times where not being able to leave the house without checking that you’ve turned the gas off twice and locked the front door three times can be a positive thing to admit in an interview. [NB: Self-deprecating hyperbole.]

It can also lead to the sort of unfortunate failure-modes that if you’re lucky you’re not familiar with.

(This was all kicked off by one of the network guys asking me some hard questions about the nature of packaging and dependencies, which set me off on a bit of a ramble. Thus network types get it first. Beard-fondling admins get their kicking later.)

EMFcamp

For reasons that seemed jolly sensible at the time (ie - someone dared me to do it) I gave a much-modified version of The Puppetcamp Talk at EMFcamp. Modified in that it was four months later and we’d made some things work better, abandoned others and had some mostly-bright ideas for new stuff.

Thus linked here somewhere should be the PDF version of the slides, which because I am that sort of forward-thinking sod (actually, I’d seen this be a requirement at 28c3, and it seemed sensible, so…) was the version I presented because the Nice People didn’t have the relevent podule to connect a Macbook to the big screen in the tent.

In Which There Is Silence and Excuses for Same

Hello. Where on earth did the last few months go? I guess I could blame it having been Grand Tour season (the Giro, the Tour and now the Vuelta) which has meant that I’ve been watching sweaty men in lycra pedal up the sides of mountains for three weeks at a time.

I guess I can also blame one of those periods when several mildly concerning things happen in rapid order like a conspiring mob of buses, and when the smoke clears you’re somewhere grim like Burnt Oak, Oldland Common or Quedgeley.

Anyway. Let’s see what happens next.

A Hazelnut in Every Bite

An obvious question is ‘Why on earth bother with all this message-wanging gubbins? Isn’t mail and/or SSH good enough for you?’ to which the glib answer is ‘No it isn’t.’

A longer answer involves spotting the really obvious problem in my last post. That is ‘Ok, so you’ve bodged in some code that’ll auto-update your live puppetmaster tree on a commit to that repository. What are the chances that said commit is b0rked and causes an epic cake-fail?’

Well, in theory you’re committing to a develop branch which only one or two of your machines are following, because that’s the entire point of having the dynamic branch rig in the first place. However, PEBCAK happens and perhaps you should have a Jenkins instance that sanity-checks (as much as is possible, anyway) the puppet code that’s just been committed.

Hurrah! Problem solved! Let’s have one of those!

I don’t know how you’d plumb something like that into another environment, but this is how it works here:

  • Configure the stomp-jenkins daemon to listen for puppet-environments commits on future.git.commits (c0dez available in the usual place)
  • Configure Jenkins for Puppet and puppet-lint
  • Configure Jenkins to emit a message on (say) future.jenkins.success if the tests pass. (c0dez for that available ditto)
  • Configure the stomp-git daemon on your puppetmaster to listen on future.jenkins.success
  • Er, profit.

Obviously enough, chaining in extra bits or constructing side-chains is relentlessly trivial.

You could probably do it with a massive make or rake file, too.

One Very Important Thought

I have been hacking on stomp-git over the past couple of days. Mostly because there may or may not be a lurking nasty with the way it sometimes stops listening on its, er listen topic, but also because it has needed a deal of de-Futuring and general fettling so it doesn’t make proper coders claw their own faces off in horror.

And because I had forgotten how I’d made it work when it came to the svn->git rollout of one of our major sites.

Thus it seems as good a time as any to explain how I think it should work and indeed why it works like that.

Java-based Diversionary Tactics

The other week, I stood in front of a room filled with my notional peers and allowed as how ‘ActiveMQ setup was a bit of a pig, but once you’d got it working it was pretty simple.’

[FX: Pause for hollow laughter]

In retrospect, that statement was obviously going to come back and bite me just as soon as it had located its special big false teeth. Thus it came to pass on Wednesday night that two of the brokers had a meltdown, which in turn broke some experimental Nagios-over-Stomp code and so kept the poor sod who was on call in a state of near panic as everything reportedly failed.

Oops.

So when I pitch up on Thursday AM, I am welcomed by the whole team merrily grousing about ‘your effing brokers’ (Some software objects are like children and pets. When they’re well behaved and/or looking cute for visitors, they’re our children or pets. When they’ve just left a deposit behind the telly, they’re your children or pets). Indeed it was a right mess. One broker had used all the memory allocated and as if for spite had run out of filehandles too. It was all a bit odd. Some poking about revealed [AMQ-jira ticket] which more-or-less explains itself and the fact that it looked an awful lot like an experimental client wasn’t responding to messages as quickly as one might hope.