Build a Docker Swarm on AWS with Ansible in 1 minute and 47 seconds

Is it possible to build a five node (3 manager nodes, 2 worker nodes) Docker Swarm in under 2 minutes? Yes it is! Some weeks ago, Hennnig Jacobs who works at Zalando Technology, posted a Tweet where he referenced an article he wrote called “Why Kubernetes?”. This article covers another post, “Maybe You Don’t Need Kubernetes”, written by Matthias Endler who works at Trivago. There are always pros and cons for every solution, but I missed Docker Swarm in his article. And there was another thing that triggered my brain. He wrote that “[…] creating a cluster on DigitalOcean takes less than 4 minutes and is reasonably cheap ($30/month for 3 small nodes with 2 GiB and 1 CPU each).”. In addition, he wrote that at Zalando they “[…] run 100+ Kubernetes clusters […]”.

Therefore I asked myself how long it would take to setup a Docker Swarm cluster with 3 managers and 2 workers on AWS by myself and furthermore would it be possible to start 101 (100+) Docker Swarm clusters too? Short answer, yes it is 😎! But lets start with the idea. And as a side-note, “3 small nodes” are not a productive setup for Kubernetes, whereas 3 manager nodes and 2 worker nodes are a productive setup for Docker Swarm.

The plan

Every time when I am going to do some creative brainstorming, I take a pen and a paper to order my thoughts. You can have a look at the picture on the left, to see what this means in this case 😁. After some thinking I was pretty sure that there is the need to do some things with Ansible in parallel. Since we are using Ansible at work over the last one and a half year, I already knew, that I will have to do some tricks to get things up and running fast. Setting up compute resources takes the most time as you have to specify your needs and of course you have to wait until you can access the compute resource to install additional software like Docker on it. The simplest way to parallelize something under Linux is to use BASH forks. Obviously this is resource intensive, but more on this point later!

After some testing it was clear to me that I will use a script to run multiple Ansible Playbooks at the same time in parallel. I ended up with the following BASH scrtipt:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


#!/bin/bash
export AWS_ACCESS_KEY=AK....
export AWS_SECRET_KEY=ai....

# create 5 Docker nodes - 3 managers and two workers
# This is the first manager where all other nodes will join
ansible-playbook --extra-vars "swarmnodename=$1-mm" aws_ec2_create_docker_swarm_node.yml &

# Add two additional managers and tag them
ansible-playbook --extra-vars "swarmnodename=$1-mn1" aws_ec2_create_docker_swarm_node.yml &
ansible-playbook --extra-vars "swarmnodename=$1-mn2" aws_ec2_create_docker_swarm_node.yml &

# Add to worker nodes and tag them
ansible-playbook --extra-vars "swarmnodename=$1-wn1" aws_ec2_create_docker_swarm_node.yml &
ansible-playbook --extra-vars "swarmnodename=$1-wn2" aws_ec2_create_docker_swarm_node.yml &

wait

# Join them together - first get join tokens from main manager and store them
ansible-playbook --extra-vars "swarmnodename=$1-mm" aws_ec2_get_docker_swarm_join_token.yml

# Add additional managers 
ansible-playbook --extra-vars "swarmnodename=$1-mn1" aws_ec2_join_swarm_as_manager.yml &
ansible-playbook --extra-vars "swarmnodename=$1-mn2" aws_ec2_join_swarm_as_manager.yml &

# Add additional workers
ansible-playbook --extra-vars "swarmnodename=$1-wn1" aws_ec2_join_swarm_as_worker.yml &
ansible-playbook --extra-vars "swarmnodename=$1-wn2" aws_ec2_join_swarm_as_worker.yml &

wait

This script uses a variable $1 which is provided by an wrapper script (to create n-Docker Swarms) which is simple counter loop. First of all, I need the mm-node, the master-manager-node. The master manager node is the node, where the docker swarm init command will be issued, after the EC2 instances are created - see line 20. To make things easier, the script will wait on line 17 for all executes. The & after the code lines are indicating that these lines are running in parallel. In line 20 the join tokens for manager and worker joins are created and then, from line 23 to line 28 used to register the nodes as master or worker in parallel too.

Some tricks

To get this up and running smooth and fast, I have to use some tricks 🤩😁 - they might be useful out there!

Trick #1: “dynamic” inventory

The EC2 instances are using dynamic ip addresses. Therefore the script and also the Ansible Playbooks cannot rely on Ansible Inventories! There are dynamic inventory scripts for Ansible and AWS (and many others) out there and they are official supported, but they are often not that fast. Thankfully, there is ec2_instance_facts for AWS to filter (find) instances which meet certain requirements. If instances are found, we add them to a in memory Ansible Inventory. Look at the Ansible Playbook below, lines 10-23.

Trick #2: Name your instances

The second trick is, to tag the instances you create with names that are dynamic but predictable. We cannot only create one Docker Swarm cluster with this Anisble Playbook, we can create hundreds if we like. Look at the Ansible Playbook below, lines 14.

Trick #3: Save Ansible data (varibales) to a local file

This is huge! you can save Ansible output to a local file and afterwards you can load the data from this file to use it in another playbook. Look at the Ansible Playbook below, lines 40-44. If you are clever, you can create very smart playbooks. Like in this case, I save the Docker join tokes to files that are name according to the Docker Swarm cluster that is currently created. Therefore, you can use this information during the parallel creation of Docker swarms!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44


# get the main manager
- hosts: localhost
  gather_facts: no
  connection: local
  vars:
    ssh_key_name: pwd-m4r10k
    region: eu-central-1
    ansible_user: ubuntu
  tasks:
    - name: List instances
      ec2_instance_facts:
        region: "{{ region }}"
        filters:
          "tag:name": "{{ swarmnodename }}"
          instance-state-name: running
      register: ec2

    - name: Add all instance public IPs to host group
      add_host:
        name: "{{ item.public_ip_address }}"
        groups:
          - ec2training
      with_items: "{{ ec2.instances }}"

- hosts: ec2training
  gather_facts: no
  vars:
    ansible_user: ubuntu
  tasks:
    - name: Get join command for manager
      shell: docker swarm join-token manager | grep join
      become: yes
      register: joinmanager

    - name: Get join command for worker
      shell: docker swarm join-token worker | grep join
      become: yes
      register: joinworker

    - name: Save manager join command local
      local_action: copy content={{ joinmanager }} dest=/tmp/{{ swarmnodename }}-join-as-manager

    - name: Save worker join command local
      local_action: copy content={{ joinworker }} dest=/tmp/{{ swarmnodename }}-join-as-worker

Trick #4: Load Ansible data (varibales) from a local file

It is easy (if you found it) to load Ansible data from local files saved previously. Look at the Ansible Playbook below, lines 30-38.

Trick #5: Use Ansible build in function

Ansible comes with a lot of handy functions. In this example, I use split to split up the number of the Docker Swarm this Ansible Playbook is running for to load the correct Docker Swarm join command. Line 32 below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


# get the main manager
- hosts: localhost
  gather_facts: no
  connection: local
  vars:
    ssh_key_name: pwd-m4r10k
    region: eu-central-1
    ansible_user: ubuntu
  tasks:
    - name: List instances
      ec2_instance_facts:
        region: "{{ region }}"
        filters:
          "tag:name": "{{ swarmnodename }}"
          instance-state-name: running
      register: ec2

    - name: Add all instance public IPs to host group
      add_host:
        name: "{{ item.public_ip_address }}"
        groups:
          - ec2training
      with_items: "{{ ec2.instances }}"

- hosts: ec2training
  gather_facts: no
  vars:
    ansible_user: ubuntu
  tasks:
    - name: Load vars
      include_vars:
        file: /tmp/{{ swarmnodename.split("-")[0] }}-mm-join-as-manager
        name: joinmanager
      register: input

    - name: Join as manager
      shell: "{{ joinmanager.stdout }}"
      become: yes

The video

Here is the video of this run with 1 minute and 47 seconds.

Create 100+ Docker Swarms

AWS raised the limit of EC2 nano instances from 28 (default) to 550 - the only thing that you have to do for this is to open up a support ticket. The next problem is, that the BASH fork mechanism is really exhausting the resources of our ansible host - and this is OK as copying the Pyhton processes is expensive. Just for test, I’ve put in 32 cores and 64 GB of memory (VMware). With this configuration I was able to start the creation of 101 Docker Swarms but then I got a lock-down of the AWS-API - “Too many requests” 😂 Maybe, in the future I will try to create 10 or 20 Docker Swarm at a time to not reach this limit.

Conclusion

Ansible in combination with AWS and Docker Swarm is pretty awesome! It was a lot of fun to optimize the playbook run to get this up and running in parallel. I will upload the playbooks into a GitLab repository the next weeks. If you need it earlier, let me know!

Have fun!