Following the the blog post GCP, Ansible, GitLab and Puppet - Part I here comes part II of it. As you may have noticed, currently Puppet is not uses for the setup for now. Therefore, I stripped it away and to sum it up, it is not in use for the Google Cloud Platform instances at the moment. As often, this will be a longer post and probably it fill the final part from the technical point of view. Maybe I will write a third one, after we have been used the setup for a couple of months. If you need details about some of this steps, let me know via Twitter.
This post is divided into multiple sections covering different topics. It does also include the GitLab pipeline schedule which are done via an Ansible container executed by the GitLab scheduled CI/CD.
As described in part I, we use Ansible to setup the GCP instances. I’ve reworked the Ansible playbooks to integrate the possibility to call some webhooks during the play run to track and monitor some statistics with Zabbix. As for now, this playbooks are using Ansible 2.8
- with Ansible 2.9
they have introduced some really important changes to the Ansible GCP modules. Therefore, the Ansible Plays made for Ansible 2.8
are not compatible with Ansible version 2.9
. The main differences are in the getting facts module, which is renamed, and in the behavior of the labeling process. Maybe I will post and update on this.
As I wrote in part I, the creation of the GCP instance is easy if you are creating the network at the same time as you are creating the instance. If not, like in our case, it is necessary, that get the information from the already existing VPC network within your project. The trick is, that you take care about the result, as it is using items
and therefore you have to use the correct access method to get the information about the network, for example: subnetwork['items'][0]
.
The next needed point is, that the instances must be labeled. Later, this will help us, to find all running/not running instances without harming any instanced which are created alongside the GitLab runners in the same VPC network inside the same GCP account. As written above, in Ansible 2.8
the labeling of instances is a little bit complicated, because it cannot be done during the instance creation. This is the reason why we have to use the GCP resource URL to update the labels accordingly.
- hosts: localhost
connection: local
gather_facts: no
vars:
type: gitlab-runner
region: europe-west3
zone: europe-west3-a
gcp_instance_name: "{{ nodename }}"
gcp_project: yourprojectid
gcp_cred_kind: serviceaccount
gcp_cred_file: yourserviceaccount.json
gcp_network_vpc: hub-private-vpc
gcp_network_subnetwork_vpc: private-subnet
tasks:
- name: get info on a network
gcp_compute_network_facts:
filters:
- name = "{{ gcp_network_vpc }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: network
- name: debug
debug:
var: network['items'][0]
- name: get info on a subnet-network
gcp_compute_subnetwork_facts:
filters:
- name = "{{ gcp_network_subnetwork_vpc }}"
project: "{{ gcp_project }}"
region: "{{ region }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: subnetwork
- name: debug
debug:
var: subnetwork['items'][0]
- name: create a disk
gcp_compute_disk:
name: "{{ gcp_instance_name }}-disk"
size_gb: 50
source_image: projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
register: disk
- name: create a instance
gcp_compute_instance:
name: "{{ gcp_instance_name }}-instance"
machine_type: n1-highcpu-8
scheduling:
preemptible: 'true'
disks:
- auto_delete: 'true'
boot: 'true'
source: "{{ disk }}"
network_interfaces:
- network: "{{ network['items'][0] }}"
subnetwork: "{{ subnetwork['items'][0] }}"
access_configs:
- name: External NAT
type: ONE_TO_ONE_NAT
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
register: result
- name: debug
debug:
msg: "{{ result.selfLink }}"
- name: Add labels on an existing instance (using resource_url)
gce_labels:
project_id: "{{ gcp_project }}"
credentials_file: "{{ gcp_cred_file }}"
labels:
type: gitlab-runner
resource_url: "{{ result.selfLink }}"
state: present
After the creation of the GCP instance has finished, the configuration of the instance has to be done. There is nothing special here, just installing Docker and handling some configuration tasks, like changing the DNS resolvers, disabling netplan, some some other minor changes. We need those changes, because we will run the GitLab runners within a private VPC connected via VPN. The only really important thing here is the linting of the netplan configuration. Editing yaml with Ansible is §$%& !
- hosts: localhost
connection: local
gather_facts: no
vars:
type: gitlab-runner
region: europe-west3
zone: europe-west3-a
gcp_instance_name: "{{ nodename }}"
gcp_project: yourprojectid
gcp_cred_kind: serviceaccount
gcp_cred_file: yourserviceaccount.json
gcp_network_vpc: hub-private-vpc
gcp_network_subnetwork_vpc: private-subnet
tasks:
- name: get info on an instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: allinstances
- name: debug
debug:
msg: "{{ allinstances }}"
- name: Add all instance public IPs to host group
add_host:
name: "{{ item.networkInterfaces.0.networkIP }}"
groups:
- gcpinstances
with_items: "{{ allinstances['items'] }}"
- hosts: gcpinstances
gather_facts: no
tasks:
- name: Wait for SSH to come up
local_action:
module: wait_for
host={{inventory_hostname}}
port=22
delay=1
timeout=180
- hosts: gcpinstances
vars:
ansible_user: sa_your_sa_user_id
gather_facts: yes
tasks:
- name: Pinging on "{{inventory_hostname}}"
ping:
- name: Configure systemd resolved configuration
copy:
src: /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/resolved.conf
dest: /etc/systemd/resolved.conf
become: yes
- name: Disable cloud.cfg
copy:
src: /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/99-disable-network-config.cfg
dest: /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
become: yes
- name: Disable netplan dns
blockinfile:
path: /etc/netplan/50-cloud-init.yaml
insertafter: '.*dhcp4: true'
block: |2
dhcp4-overrides:
use-dns: no
become: yes
- name: Add Docker GPG key
apt_key: url=https://download.docker.com/linux/ubuntu/gpg
become: yes
- name: Add Docker APT repository
apt_repository:
repo: deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ansible_distribution_release}} stable
become: yes
- name: Install Docker
apt:
name: "docker-ce=5:19.03.5~3-0~ubuntu-bionic"
state: present
update_cache: yes
become: yes
- name: Copy Docker gitlab-runner systemd file
copy:
src: /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner.service
dest: /etc/systemd/system/docker-gitlab-runner.service
become: yes
- name: Copy Docker gitlab-runner systemd start file
template:
src: /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-start.sh.j2
dest: /usr/local/bin/docker-gitlab-runner-start.sh
mode: "0744"
become: yes
- name: Copy Docker gitlab-runner systemd stop file
copy:
src: /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-stop.sh
dest: /usr/local/bin/docker-gitlab-runner-stop.sh
mode: "0744"
become: yes
- name: Enable Docker gitlab-runner systemd service
systemd:
name: docker-gitlab-runner
daemon_reload: yes
enabled: yes
state: started
masked: no
become: yes
The next thing we need, is an Ansible playbook to stop all running instances. This will later be used by the GitLab scheduled CI/CD Pipeline. The Ansible playbook is split in two parts because otherwise it is not possible to trigger a webhook on every machine that is stopped. We use this webhook, to submit the result of the playbook to Elastic. There we can retrieve a neat statistic how often we stopped the GitLab runners. You can find this information in the second part of the yaml files below. The trick is, that you can only use one single command if you are using the Ansible with items
statement - but hey, you can use an include there and include multiple tasks from another file! Yes!
In the first part of the Ansible playbook, we trigger some webhooks to monitor the status of the GCP GitLab runners - we will to the same for the starting of the GitLab runner. With this, we can get nice graphics about how many GitLab runners are still running, who many have to restarted and so on.
Here comes the first part of the Ansible playbook
- hosts: localhost
connection: local
gather_facts: no
vars:
type: gitlab-runner
region: europe-west3
zone: europe-west3-a
gcp_instance_name: "{{ nodename }}"
gcp_project: yourprojectid
gcp_cred_kind: serviceaccount
gcp_cred_file: yourserviceaccount.json
gcp_network_vpc: hub-private-vpc
gcp_network_subnetwork_vpc: private-subnet
tasks:
- name: Get NOTRUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:TERMINATED
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: notrunning
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-pre-terminated
method: POST
body: "{{ notrunning }}"
body_format: json
- name: Get RUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:RUNNING
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: running
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-pre-running
method: POST
body: "{{ running }}"
body_format: json
- name: STOP all not running instances
include_tasks: gcp_compute_stopall_webhook_gitlabrunner.yml
with_items: "{{ running['items'] }}"
- name: Get RUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:RUNNING
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: running
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-post-running
method: POST
body: "{{ running }}"
body_format: json
Here comes the second part of the Ansible playbook
---
- name: STOP all not running instances
gcp_compute_instance:
name: "{{ item.name }}"
status: TERMINATED
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
- name: Trigger webhook
uri:
url: https://your-webhook-url/webhook/gcp-glr
method: POST
body: "{{ item }}"
body_format: json
As you may have mentioned, we are using preemtible instances for the GitLab runners, because they are cheap and can perfectly be used for CI/CD jobs, like in our case QF tests (web ui frontend tests). The premtible GCP instances could be stopped by GCP at any time. Therefore we run a GitLab CI/CD scheduled pipeline every 5 minutes to keep the runners alive. Of course there are better ways to achieve this, like starting the machine on every GitLab job run, but currently, it is not easy to integrate this. The following playbooks are the same, just for starting already stopped GitLab runners.
Here comes the first part of the Ansible playbook
- hosts: localhost
connection: local
gather_facts: no
vars:
type: gitlab-runner
region: europe-west3
zone: europe-west3-a
gcp_instance_name: "{{ nodename }}"
gcp_project: yourprojectid
gcp_cred_kind: serviceaccount
gcp_cred_file: yourserviceaccount.json
gcp_network_vpc: hub-private-vpc
gcp_network_subnetwork_vpc: private-subnet
tasks:
- name: Get NOTRUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:TERMINATED
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: notrunning
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-pre-terminated
method: POST
body: "{{ notrunning }}"
body_format: json
- name: Get RUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:RUNNING
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: running
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-pre-running
method: POST
body: "{{ running }}"
body_format: json
- name: START all not running instances
include_tasks: gcp_compute_keepalive_webhook_gitlabrunner.yml
with_items: "{{ notrunning['items'] }}"
- name: Get RUNNING instances
gcp_compute_instance_facts:
zone: "{{ zone }}"
filters:
- labels.type:gitlab-runner AND status:RUNNING
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: running
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-post-running
method: POST
body: "{{ running }}"
body_format: json
Here comes the second part of the Ansible playbook
- name: Start all not running instances / Report Webhook
gcp_compute_instance:
name: "{{ item.name }}"
status: RUNNING
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
- name: Webhook call
uri:
url: https://your-webhook-url/webhook/gcp-glr
method: POST
body: "{{ item }}"
body_format: json
First, the webhooks described above, will result in a nice Zabbix graph where you can see, how often the GCP GitLab runners were not running. The graph shows one full week and there were only two times were some of the currently five GPC GitLab runner instances were not running. We are using the GCP GitLab runners from 7am until 6 pm. The crossing lines were marking those points, were the stopall and keepalives are started the moring and in the evening.
Inside Elasitc, we can see the Ansible output of the playbook runs.
After we’ve created the playbooks, we would like to run them. This can be done by a simple cron job of course, but we can do this also with GitLab which gives us some benefits. The first one is, that we can use the Git repository where or Ansible playbook sources are stored. The second is, that we can use secure variables for the GCP credentials and the last but most important one is, that we can create a Ansible image which contains the correct Ansible version that we need to run this Ansible playbooks. Later we can easily migrate the Ansible playbooks to a newer version without breaking the existing ones and of course we can separate the Ansible playbook source to a separate repository.
The Dockerfile for the Ansible image is pretty simple, because it just installs Python pip, Ansible and the dependencies which are needed for the Google GCP.
FROM ubuntu:18.04
RUN apt update && apt install -y python-pip openssh-client git && pip install requests google-auth google-api-python-client ansible==2.8.8
The stopall and the keepalive branch will contain mostly the same, only the triggered scripts will be different.
Here is the gitlab-ci.yml
:
image: <your-gitlab-registry>/image:gcp-2.8.8-2020-02-03-01
variables:
DOCKER_DRIVER: overlay
services:
- docker:dind
keepalive:
stage: build
script:
- ./keepalive.sh
tags:
- docker-build
only:
- manual
keepalive-schedule:
stage: build
script:
- ./keepalive.sh
tags:
- docker-build
only:
- schedules
And the keepalive.sh
contains the following:
#!/bin/bash
git clone https://gitlab+deploy-token-31:$ANSIBLE_REPOSITORY_DEPLOY_TOKEN@<your-ansible-source-repositroy>/ansible/legacy.git /var/opt/ansible-prod
cd /var/opt/ansible-prod/plays/gcp-compute-gitlabrunner-prod
./gcp_keepalive_gitlab-runner.sh
At last, you need to configure some schedules to run. Finally this will look like this:
And the result will be:
This post should give you an idea about how you can run GitLab runners in GCP with preemtible instances. If you need more details, reach out for me on Twitter (or somewhere else)! Happy hacking!
Icons made by itim2101 from www.flaticon.com.
Mario