Kubernetes and Weave - a no go per default

Estimated reading time: 5 mins

The default Weave deployment for Kubernetes is not as secure as it should and could be. What does this mean? In fact it does mean, that if you setup a Kubernetes cluster via kubeadm and install Weave Works overlay network via the recommended installation guide, any host that installs Weave manually can join this overlay network!

Details

We are reviewing Kubernetes every couple of month, because there is some (small) kind of uncertainty if Docker Swarm will stay alive or not. Therefore it is always good to know if there are alternatives, like Kubernetes, and how they fit in into an existing environment. Docker Swarm is unmatched simply to manage, we drive more than 2300 containers today, but it is not as hyped as Kubernetes. So we have to stay up-to-date how we can use Kubernetes to replace Docker Swarm if we have to. And every single day when we have a look at it, our review boils down to two main problems, networking and storage. Today we will focus on the network integration, CNI for detail.

There are are lot of Kubernetes CNI plugins out there, but some of them are widely used: Calico, Weave and Flannel. Weave is, of course, a nice solution to get up and running with CNI capabilities inside Kubernetes easily. But there is a major drawback.

The problem

If you are following the default installation guide for Weave within Kubernetes there is no password used to protect your overlay network against suspicious network peers!

In the following output, the host atlxkube474 is not part of the Kubernetes cluster. But it can join the created Kubernetes Weave network easily by specifying one of the main peers during weave launch.

Kubernetes cluster nodes:

[19:52 atlxkube471 ~]# kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
atlxkube471   Ready    master   12d   v1.15.3
atlxkube472   Ready    <none>   11d   v1.15.3
atlxkube473   Ready    <none>   11d   v1.15.3

Suspicious Weave host:

19:54 atlxkube474 ~]# kubeclt

Command 'kubeclt' not found, did you mean:

  command 'kubectl' from snap kubectl (1.15.3)

See 'snap info <snapname>' for additional versions.

[19:55 atlxkube474 ~][127]# weave launch 10.x.x.1
... truncated output ...
INFO: 2019/09/16 17:55:22.483698 sleeve ->[10.x.x.2:6783|c6:cf:11:33:a1:ca(atlxkube472)]: Effective MTU verified at 1438
[19:56 atlxkube474 ~]# eval $(weave env)
[19:56 atlxkube474 ~]# weave status

        Version: 2.5.2 (up to date; next check at 2019/09/17 01:10:02)

        Service: router
       Protocol: weave 1..2
           Name: da:09:b8:ee:77:3e(atlxkube474)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 1
    Connections: 3 (3 established)
          Peers: 4 (with 12 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.32.0.0/12
  DefaultSubnet: 10.32.0.0/12

        Service: dns
         Domain: weave.local.
       Upstream: 10.x.x.50, 10.x.x.51, 10.x.x.52
            TTL: 1
        Entries: 0

        Service: proxy
        Address: unix:///var/run/weave/weave.sock

        Service: plugin (legacy)
     DriverName: weave

And now, everyone can run a container that joins the singe weave overlay network and can do anything that is possible:

[20:00 atlxkube471 ~]# kubectl get pods -n deployment-v1 -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP          NODE          NOMINATED NODE   READINESS GATES
nginx-deployment-5754944d6c-z486x   1/1     Running   0          6d9h   10.44.0.1   atlxkube472   <none>           <none>

[19:59 atlxkube474 ~]# docker run --name a1 -ti weaveworks/ubuntu
root@a1:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1) 56(84) bytes of data.
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=2.00 ms
64 bytes from 10.44.0.1: icmp_seq=2 ttl=64 time=0.672 ms
^C
--- 10.44.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.672/1.337/2.002/0.665 ms
root@a1:/#

This example shows that any host can really, really easily join the Weave overlay network! This is not a secure by default design!

Fix it

I know, that it is possible to set a password for Weave, which is used to encrypt the network traffic and to protect unknown host to join the Kubernetes created Weave overlay network. This is described here.

Lets do this for our Kubernetes installation right now. Thanks to summerswallow-whi for opening a KOps issue which is already addressing this. The issue is still open (May 2018), 🙁, but it provides a lot of information how you can iron your Weave setup.

I tried it on my own and the following steps are enough to bring some kind of protection to your Weave overlay.

First, create a password for your Weave overlay and save it to a file:

# < /dev/urandom tr -dc A-Za-z0-9 | head -c16 > weave-password

Now create a Kubernetes secret:

# kubectl create secret -n kube-system generic weave-password --from-file=./weave-password

Add this setting to the Weave Kubernetes daemonset by editing it (under the weave-net container spec):

kubectl create secret -n kube-system generic weave-password --from-file=./weave-password
...
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: weave-net
    spec:
      containers:
      - command:
        - /home/weave/launch.sh
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: WEAVE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: weave-password
              name: weave-password
       image: docker.io/weaveworks/weave-kube:2.5.2

If you now try to join the Weave overlay network, you will see the following failure:

[20:19 atlxkube474 ~]# weave launch 10.x.x.1
24ef2192c30e3c5d9372469eb1fb456e348cdd9efe4b8cda27c3ba1e756ba73c
[20:19 atlxkube474 ~]# docker logs weave
...
INFO: 2019/09/16 18:19:50.677943 ->[10.x.x.1:6783] connection shutting down due to error during handshake: no password specificed, but peer requested an encrypted connection

Yes! This is want we want! And nothing less!

Conclusion

Weave is using, I my opinion, a non secure default setup. This violates the GDPR article 25 at least. Encryption is a default today! This is one of the points where Docker Swarm is much, much better. Docker swarm creates a VXLAN overlay network for each service per default (not just one single overlay for anything like Weave does)! 😎 Furthermore you cannot(!) join a Docker Swarm without knowing the join token and therefore you cannot infiltrate a existing Docker Swarm overlay network! It is secure by default! No additional screws needed!

It is hard to believe, but even Weave Works itself does not provide a Kubernetes documentation about how to enhance your Kubernetes Weave setup (Kubernetes Secrets, Kubernetes Daemonset,…) Even the actual edition of the book Kubernetes: Up and running, Second Edition does not mention something like this in any way.

I don’t like to know, how many insecure setups are out there - hopefully everyone trust his or her own network. 🙄

Sometimes we think, that the whole industry is just fueling up the whole Kubernetes thing to sell more and more consulting services…

But, lets see…

Posted on: Mon, 16 Sep 2019 01:39:06 +0200 by Mario Kleinsasser

  • Kubernetes
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!

Multi-Project Pipelines with GitLab-CE

Estimated reading time: 9 mins

This year at the DevOps Gathering 2019 conference at Bochum, Alexander and I met Peter Leitzen who is a backend engineer at GitLab and together we chatted about our on-premises GitLab-CE environment and how we are running GitLab Multi-Project Pipelines without GitLab-EE (GitLab Enterprise Edition). We promised him, that we will write a blog post about our setup but as often, it took some time until we were able to visualize and describe our setup - sorry! But now, here we go.

We will not share our concrete implementation here, as it makes no sense because everyone will have a different setup, different knowledge or is using another kind of programming language. Nevertheless we will describe what we are doing (the idea) and not how we are doing it - you can use whatever programming language you like to communicate with the GitLab API (because )it doesn’t matter).

Some background story

We have a lot of projects in our private GitLab and of course a lot of users, at the time of writing approximately 1500 projects with around 400 users, since we use GitLab since more than five years. Not only developers are using the GitLab, also colleagues who just want to version their configuration files and much more. With GitLab-EE it is possible to run multi-project pipelines - but this is a premium feature (Silver level) which costs $19 per GitLab user per month. Only some of our 400 GitLab enabled users need GitLab multi-project pipelines but sadly there is no way to just subscribe only some users to GitLab premium. 🙄

Back then we were sure (and it is still a fact today) that we will need multi-project pipelines for our new Docker Swarm environments (we started more than two years ago). Together with the developers we decide that we would like to have classic source-code Git repositories and deploy Git repositories for various reasons. Here are some reasons:

You can image the latter as the marriage of a cars body and his chassis - after the pipeline run you will have the sum of the source code and the deploy repository - the container image.

The idea

Let’s show the idea (click on the picture to the left). We have numbered the individual steps to explain them more in detail. Lets start with some additional information regarding the environment in general.

Overview

On the left side of the picture is our GitLab server installation which is responsible for all the repositories. We are fully committed to use a 100% GitOps approach since we started our container journey. Nothing happens to our applications within our Docker Swarm environments without a change in the corresponding Git repository. Later you will see that we have build some similarities to Kubernetes Helm but much simpler. 😂 Let’s deep dive into the details.

One

Point one shows a public(!) repository which we are using as source for the GitLab CI/CD templates. GitLab has introduced CI/CD templates with the release of GitLab v11.7 and we use them. Inside the templates Alex has build up a general structure for the different CI/CD stages. For example, the .gitlab-ci.yml file from one of our real world project just looks like this:

include: 'https://<our-gitlab-server>/public-gitops-ci-templates/gitops-ci-templates/raw/master/maven-api-template.yml'

image: <our-gitlab-registry>/devops-deploy/maven:3.6-jdk-12-se3

Yes, you see right! Thats all! Thats the whole .gitlab-ci.yml from a productive Git project. All the magic is centrally managed by public GitLab templates. In this case we look at a project of our developers. The nice thing about GitLab templates is, that values can be overwritten, like the image variable in this example! Simple but not complex - in any case it is possible to take a look in the template to know what is going on, because it is not hidden abstraction magic. By the way: there is absolutely no sensitive information in the templates! Everything that is project specific is managed by the affected project itself. Thats the link to point number two.

Two

As shown, the .gitlab-ci.yml in the source code Git repository imports one of the global templates which are centrally maintained. And as said before, all relevant parameters for the templates where provided by the source code repository. There are a lot of secret variables there and therefore Alex made a special pipeline which bootstraps a new Git project if needed. This is not part of this post because this is very specific and it does depend on your needs. But it is important to know (and for the explanation), that during the creation of a new source code project, two GitLab project variables called SCI_DEPLOY_PRIVATE_KEY and SCI_DEPLOY_PRIVATE_KEY_ID are created and initialized.

Now, the GitLab CI/CD of the source repository is transmitted to the GitLab Runner and the job is started. The job itself uses a self made service which kicks in the GitLab Multi-Project Pipelines functionality. Therefore, we have to head over to number three - the self written GitLab Helper Rest Service.

Three (Part 1)

In the step before, the GitLab CI/CD pipeline of the source repository was triggered and at this point we imagine that the deploy job from the source repository is called. The deploy stage is part of the .gitlab-ci.yml template mentioned before. Now, in the template the following happens (this is the hardest part of all)

First of all, a SCI_JWT_TOKEN is generated to ensure a solid level of trust. The communication between the pipeline runner and our GitLab Helper Rest Service is TLS encrypted but we would like to be sure that only valid calls are processed. Take a look at the script line below. Here is the point, where the SCI_DEPLOY_PRIVATE_KEY comes back again. The little tool /usr/local/bin/ci-tools is part of the the Docker image that is used during the GitLab deploy pipeline run. It is self written it does nothing more than to generate and sign a JWT token. It is written in go - simple and efficient. The tool needs some important parameters

In summary, we have a signed JWT token which includes the currently running job number of this specific project which is referenced by its ID.

...
.deploy-template: &deploy-template
  script:
    ...
    - export SCI_JWT_TOKEN=$(echo "$SCI_DEPLOY_PRIVATE_KEY" | /usr/local/bin/ci-tools crypto jwt --claims "jobId=$CI_JOB_ID" --claims "projectId=$CI_PROJECT_ID" --key-stdin)
...

After the SCI_JWT_TOKEN is generated, it is used to call the GitLab Helper Rest Service via curl (TLS encrypted). Please notice the deployKey REST method in the REST call. It is important to also see, that a variable called SCI_DOCKER_PROJECT_ID is used as REST parameter. The variable SCI_DOCKER_PROJECT_ID references the Docker Deploy project, this is number four in our overview! The GitLab Helper Rest Service now creates and enables a GitLab DeployKey if it isn’t already enabled there. That’s the trick to enable GitLab Multi-Project Piplines automatically! The GitLab Helper Rest Service verifies that the jobId=$CI_JOB_ID transmitted with the JWT token is valid by looking up the CI_JOB_ID via the GitLab API.

...
    - curl -i -H "Authorization:Bearer $SCI_JWT_TOKEN" -H "Consumer-Token:$SCI_INIT_RESOURCE_REPO_TOKEN" -X POST "${SCI_GITLAB_SERVICE_URL}/deployKey/enable/${SCI_DOCKER_PROJECT_ID}/${SCI_DEPLOY_PRIVATE_KEY_ID}" --fail --silent --connect-timeout 600
...

Sadly the GitLab Helper Rest Service has to have administrative global permissions inside the GitLab installation to handle this tasks - thats the price to pay.

But we are not finished yet - here comes more cool stuff!

Three (Part 2)

Now, the source code repository pipeline has done the setup for the deploy pipeline. Based on the template variable configuration, the .gitlab-ci.yml for the deploy repository pipeline is automatically generated! You can imagine this as some kind of Kubernetes Helm, just for GitLab and simply integrated inside the GitLab repositories - GitOps! All and everything is stored inside the repositories! The last thing to do is, to push the generated config into the deploy repository. This is easy, because the GitLab deployKey was setup just before. 😇

And now some additional cool magic. Due to the commit into the deploy repository in the last step, the GitLab Helper Rest Service would be able to just run the pipeline which would be automatically created but then we would loose the information about who has trigger the pipeline. Therefore, an additional GitLab Helper Rest Service REST call is issued (see below). This one reads out the user who has created the jobId=$CI_JOB_ID in the source code repository. After this is done, the GitLab Helper Rest Service impersonates itself a exactly this user and creates the deploy pipeline in the deploy repository as this user and runs it 😎 - the deploy pipeline runs with the same user as the pipeline in the source code repository. Nice!

...
    - curl -i -H "Authorization:Bearer $SCI_JWT_TOKEN" -H "Consumer-Token:$SCI_INIT_RESOURCE_REPO_TOKEN" -X POST "$SCI_GITLAB_SERVICE_URL/pipeline/create/$SCI_DOCKER_PROJECT_ID" --data "branch=$SCI_DOCKER_SERVICE_BRANCH_NAME" --fail 
...

In addition, this is also a security feature because the user who runs the source code pipeline must also be a member of the deploy repository. Otherwise the pipeline cannot be created by the GitLab Helper Rest Service. This enables us to have developers who are able to push to the source code repository but are not able to run deploys - simply, they are not a member of the deploy repository.

Four

This is the deploy Git repository. It is used to run the deploys and it is part of our way to run GitLab Multi-Project Pipelines.

Five

The deploy pipeline takes the Docker Swarm config which is generated by the source code pipeline run and pushed to the deploy repository by our GitLab Helper Rest Service (like Kubernetes Helm) and updates the Docker Swarm Stack (and Docker Services)

Conclusion

This blog post gives you an idea how to create a Multi-Project Pipeline functionality with only GitLab-CE (on-premises). The idea enables you to have

The cost of this is, that you have to create an external service which will have administrative GitLab permissions. Warning: Do not write such a service if you are unfamiliar how to do it in a secure manner!

Hopefully GitLab will enable Multi-Project Pipelines also for GitLab-CE users for free in the future!

If you have questions or would like to say thank you, please contact us! If you like this blog post, please share it! Our bio-pages and contact information are linked below!

Alex, Mario

Posted on: Sun, 15 Sep 2019 01:39:06 +0200 by Alexander Ortner , Mario Kleinsasser

  • GitLab
Alexander Ortner
Alexander O. Ortner is the team leader of a software development team within the IT department of the STRABAG BRVZ construction company. Before joining the STRABAG SE he obtained a DI from the Department of Applied Informatics, Klagenfurt in 2011 and another DI from der Department of Mathematics, Klagenfurt in 2008. He is a software engineer since more than 10 years and beside the daily business mainly responsible for introducing new secure cloud ready application architecture technologies. He is furthermore a contributor to introduce a fully automated DevOps environment for the highly diversity of different applications.
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!

Why Open Source is great - Fix an issue in under 48 hours

Estimated reading time: 4 mins

This is a follow up post to Reviving a near-death Docker Swarm cluster where we showed, that a Docker Swarm cluster can be hurt badly if DNS does not work (because of a storage hiccup). Therefore it was obvious that we had to enable caching in our coreDNS servers.

A short recap about the situation: We use coreDNS with ETCD as storage backend for the DNS records. This is a common use case - it is the same as in Kubernetes. We use the same concept as Kubernetes does but for slightly different purposes because the ETCD has an easy to implement API. We started to use coreDNS way before it came to Kubernetes as a DNS service. We also helped to implement the APEX records and we also did some bug triage in the past.

Enabling caching in coreDNS is simple, just add the cache statement to the Corefile as documented in the plugin.

The problem

So yes, we enabled caching and some minutes later our monitoring system showed several systems which were not able to do the Puppet agent run anymore. This happened on Tuesday afternoon around 3pm. After the monitoring alerted the problem, we already guessed, that it has something to do with the shortly before enabled cache in our coreDNS instances. A rollback would be possible without any problems, because we use the coreDNS inside containers, their image is build via GitLab CI/CD and the Docker run is issued by Puppet on the given hosts. So a rollback is pretty easy! But we didn’t roll back because only some of our hosts had an DNS resolve error, the rest (hundreds) were running fine!

Analyzing the problem

We suspended the rollback to a previous Corefile (coreDNS config) and took a closer look at the affected hosts. Shortly after we knew that only older Linux OS’s were hit by this problem, Bernhard started to search on the internet about this specific problem because we also got the following log output from a rsync run (and a similar one from the Puppet agent):

rsync: getaddrinfo: rsync.example.com 873: No address associated with hostname
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.9]

We found two GitHub entries, this one in coreDNS / this one in Kubernetes. And in addition a Stack Overflow post article.

Ok, there was something strange going on with some old clients. We decided to share our information inside the issues above. You can read the issues and the pull requests if you like to read the full details. In short, I mailed with Miek Gieben who is one of the coreDNS maintainers privately after we chatted on GitHub to share some tcpdumps with him. DNS is really something you won’t mess around that deep. It’s ugly and I am feeling great respect for those who are working in this field, like Miek does! Kudos to all of you!

The result

After chatting via e-mail we shortly came to the point, that the switching of the authoritative/non-authoritative flag - that is one(!) single bit in the header of the UDP datagram of a DNS query response - confuses older clients, because at first they get an authoritative answer and on the next query (within the TTL of the record) they get a non-authoritative answer. Some older DNS client code is screwed up at this point.

Miek provided a PR for this, I opened up an issue and on Thursday morning I did a manual build of the coreDNS including this PR and everything worked fine. As mitigation in between, Bernhard rolled out a hosts entry for our Puppet master domain on all affected hosts! Thanks! But some hosts with quite old software were still affected. Therefore this PR works much better.

Thank you!

We would like to say “Thank you!” to all who are working in Open Source and in this case especially to Chris O’Haver, Miek Gieben and Stefan Budeanu! This shows why Open Source is unbeatable when it comes to problems. Of course you have to have know how to do work together like we did in this case, but you have the opportunity to do it! Don’t be afraid and try it! Getting a fix for a problem within 48 hours is absolutely impressive and stunning! I am sure that this is not possible with closed source.

Posted on: Fri, 14 Jun 2019 04:39:06 +0200 by Mario Kleinsasser

  • Culture
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!

About X vs Y in tech

Estimated reading time: 4 mins

Actually, I was going to write a blog post about “Why Docker might be enough” but soon I caught myself, that this blog post would be one of these posts where someone (I) tries to convince somebody to use something (in this case Docker) instead of another thing (Kubernetes maybe). Bad, because in this case the article wouldn’t be more than one of these “X vs Y in tech” blog posts the internet is already flooded with.

Why is it that hard to accept one others position?

One reason might be because we are trained to compare everything at any time since we were young? For example, at school we got our marks and we all know that an one is better than a five (in school systems with numeric marks). Pretty clear, isn’t it? I think it’s not, because it’s only one of probably many ways to look at it. At school, one mark does not make a report card. There are many different subjects, and if someone is maybe bad at math he or she might be good at PE.

Another example is sports. There is always “versus”, in every game. And most of the time there is one, and only one, winner. The second or third place hardly matters. This is something which is trained over years, since our early days. Therefore, we are conditioned at that in most of our discussions only the first place matters. “The winner takes it all the loser standing small.” s[ABBA]

And the same point of view should be taken into account, if we are talking about X vs Y in tech. Often, only technical parameters are compared, for example: Kubernetes clusters can scale out to more than 5000 nodes versus Docker Swarms are able to handle about 1000 save and sound. So, ding ding, point for Kubernetes! Pretty clear, isn’t it? I think it’s not, because a lot of other parameters are not taken into account. One important thing is experience, the other point is the environment.

About experiences and environments

As we all proceed throughout our tech lives, we make different experiences, good ones and bad ones. And of course, as we all proceed, we prefer the solutions which worked well. That’s why we tend to convince others about our solution. That’s human but sometimes it would be helpful, if we first try to understand the environment of the person we discuss with. An example: At KubeCon EU 2019 my colleagues and I talked with a lot of people. In nearly every chat, sooner or later we were asked where and how we use Kubernetes. Our answer was: We are currently not using Kubernetes (neither in the Cloud nor as on-premises enterprise solution), we use native Docker Swarm. The reaction differed from wondering to horrify - until we, the Devs and Ops, explained our environment, our experience, our knowledge and also the why (simply we don’t need it at the moment) behind our current decision - which is not set in stone!

Co-operations would be better

In our spare time, we (Bernhard and I and some other colleagues) are also playing computer games for fun. And you know what? We are always playing co-operative games, were three or four people have to co-operate to achieve the given goal. It is much more satisfying than playing a player versus player game. If you are interested in the background to co-operative behavior take a look at the Prisoner’s dilemma.

Summary

Instead of focusing about the “versus” when comparing two (or more) things, I think it would be much more helpful to focus on the co-operative or the synergy between to objects. Like in the picture of this post, it might be absolutely OK that two persons can be right at the same thing at the same time but with different points of view about the object of interest. But if they co-operate, they can find a way where both can still be right. In this case, the common result will exceed the individual achievement.

Maybe, next time I will write an article about “The synergy of container technologies”… 😂

Posted on: Sat, 08 Jun 2019 04:39:06 +0200 by Mario Kleinsasser

  • Culture
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!

Reviving a near-death Docker Swarm cluster

Estimated reading time: 3 mins

or why a storage update can hurt your cluster badly.

Today, shortly before our working day has ended, one of our Docker Swarm Clusters, the test environment cluster, nearly died. This wasn’t a Docker Swarm fault, but a coincidence of several different causes. At this point, we would like to share, how we are encountering such situations. Therefore, here comes the timeline!

16:30

At this time a developer contacted us that he had a problem with deploying his application in our Docker Swarm Test Cluster. The application container hasn’t started correctly and he didn’t know why.

16:35

So we had a look at the service the developer mentioned with the docker service ps command, to get some additional information.

ID                  NAME                IMAGE NODE                        DESIRED STATE       CURRENT STATE             ERROR                         PORTS
0cnafdmxdrvf        wmc_wmc.1           ...   xyz123                      Running             Assigned 8 minutes ago                                  
qd2muj3pdy1d         \_ wmc_wmc.1       ...   abc123                      Remove              Running ... ago                                 
me85ue4xii3f         \_ wmc_wmc.1       ...   3pp5zmsfe3jz2n5o54azylgtf   Shutdown            ... ago                   "task: non-zero exit (143)"   
ssbthaef0093         \_ wmc_wmc.1       ...   smqghxgmbkyxi5dn9odd9r39v   Shutdown            ... ago                                 

The service hang on remove! That’s never a good sign, as the remove of the container should be done pretty fast. If something like this happens, strange things are going on.

16:40

A look into journalctl --since 2h -u docker.service shows that around 16:22 the Docker Swarm raft was broken. Now the question was - why? In the logs, we saw, that at this point in time, a Docker swarm deployment was running. Which is OK, since we are using GitLab as our GitOps/CI tool.

16:45

On the host were the remove wasn’t finished, we found an additional information in the journalctl --since 2h -u docker.service log.

Jun 03 16:22:14 abc123 dockerd[1400]: time="2019-06-03T16:22:14.014844062+02:00" level=error msg="heartbeat to manager { sm.example.com:2377} failed" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" method="(*session).heartbeat" module=node/agent node.id=vti4nk735ubv9gk9x126dk9tj session.id=cgpntgnefhtqqj5f5yrci4023 sessionID=cgpntgnefhtqqj5f5yrci4023

This log messages says that the Docker host can’t connect to the Docker swarm cluster manager what’s not really good but pointed us to the next systemd - DNS.

16:50

A look at the DNS logs of our coreDNS showed many of the following log messages.

2019-06-03T16:22:18.168+02:00 [ERROR] plugin/errors: 0 ns.dns.api.example.com. NS: context deadline exceeded
2019-06-03T16:22:23.347+02:00 [ERROR] plugin/errors: 0 el01.api.example.com. A: context deadline exceeded
2019-06-03T16:22:23.410+02:00 [ERROR] plugin/errors: 0 sl.example.com. A: context deadline exceeded 

So DNS wasn’t working correctly. Our database for coreDNS is our etcd cluster…

16:55

The etcd cluster logs showed the following.

2019-06-03 16:22:08.156352 W | etcdserver: read-only range request "key:\"/dns-internal/com/example/sm/\" range_end:\"/dns-internal/com/example/sm0\" " took too long (3.243964451s) to execute
2019-06-03 16:22:17.059472 E | etcdserver/api/v2http: got unexpected response error (etcdserver: request timed out)

Our etcd cluster couldn’t read the data from the disk/storage. A quick phone call to our storage colleagues informed us that they are making planned firmware upgrades and therefore the storage controllers had to make planned “failovers”.

Conclusion

We are using a dns name to join the Docker worker hosts to the Docker manager hosts in our Ansible scripts and the dns information is stored somewhere in the Docker raft database. We are also not using a DNS caching mechanism in our coreDNS installation which can cause really bad outages in this situation because the dns name isn’t resolvable. Our Docker Swarm Test Cluster had an inconsistent state at this point and we had to restart our 3 managers one after another so that they where able to rebuild/validate themselves with the cluster information of the running stacks/services/tasks. With Docker 18.09.3 this works pretty well and we had our cluster control back again in less than half an hour. (including the analysis) No running services where effected except the one with the deploy problem at the beginning.

Posted on: Mon, 03 Jun 2019 15:39:06 +0200 by Mario Kleinsasser , Bernhard Rausch

  • Docker
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!
Bernhard Rausch
CloudSolutionsArchitect/SysOps; loves to get things ordered the right way: "A tidy house, a tidy mind."; configuration management fetishist; loving backups; impressed by docker; Always up to get in contact with interesting people - do not hesitate to write a comment or to contact me!