GitLab Runners with preemptive GCP instances created with Ansible and managed by Puppet can be a cost efficient way to boost your GitLab pipelines.
But, step by step. This will be a blog series about the attempt to build up GitLab Runners on GCP (Google Cloud Platform) with preemptive instances to enable the colleagues from the software development department to run their (cpu intensive) QF-tests with this instances directly from GitLab. Currently we are already using GitLab Runners for QF-tests on premises on VMWare ESX but they are highly CPU intensive.
Therefore, we have decided to try to find a way to use the cloud for us - GCP preemptive instances might be are really a good fit - lets see.
Part I will focus on the Ansible setup because the network connection between GCP and on-premises is not already established yet. There are some pitfalls with the 2.8.2 version of Ansible and some things you have to know. As often, there are people out there who already did some work on this, and as always, I will mention them for their great work. For the readers, I know that Ansible 2.8.2. is not the latest version - but keep in mind that if you have around 190 plays
like we have, you can be sure that you will break some of the plays if you do an upgrade. Therefore someone does not simply upgrade Ansible - it needs a time schedule.
The first thing we need is a way to bootstrap infrastructure within GCP. We will do this with Ansible. There are many ways to solve this task, Puppet Bolt, Terraform, Cloud Native, >name your tool here<, but we do it with Ansible - as written above, we already have a lot of plays.
To use Ansible with GCP, we have to create a GCP Service Account. Just follow the GCP documentation, it is straight forward. The important
thing here is, that we need a GCP Service Account Key File
, that is basically a json file containing the login information of the GCP Service Account. This file is useable by the Ansible GCP modules later!
Furthermore, we need some special permissions for this GCP Service Account, because a newly created Service Account does not have any permissions according to the least privileges model of GCP. If you would like to automate pretty much everything, it makes sense to apply the GCP role Compute Admin
.
In addtion, we have to apply some more roles, as we need SSH access
after the compute instance is created. The needed roles we need theirfore are Compute OS Admin Login
and Service Account User
.
Thanks to Alex Dzyoba for the summary about the whole process. If you are in the need of or if you would like to read some details, head over to his blog post about How to configure OS Login in GCP for Ansible!
Furter details about Setting up OS Login are available via the provided link.
We start with the first task and create a GCP compute instance. The examples provided by the Ansible documentation about the gcp_compute_instance module are a little bit difficult to read. Not because of the features but the GCP Ansible module was reworked. Before Ansible 2.8, it was a single module. Today it is split up in multiple modules to ease the maintenance. But, most of the examples on the internet are discribing the old way. For example in the older Ansible GCP module you were able to just use the name of network
yaml parameterm, as simple string. In the newer Ansible GCP module version, this is an object
and not a string anymore!
Thats a little bit problematic. I mean, it’s logical, that you create a disk object for your new install reflecting your needs (OS, size, …) but why do I have to create a network
and why do someone have to create an ip address
if we would just like to use an IP from a shared ephemeral IP address pool of GCP?
If you setup a GCP project you will get the default
networks - which you will delete - and then you create your custom network. This is a one time task for most of the project. There is no need to open a new address space for every compute instance. I know, that Ansible will take care about this step and Ansible will not create the network, if it already exists but if you have some more Playbooks, then you might end up with the same network in multiple accounts. This is OK if you do not have on-premises VPN (internet bubbles only), but we create the projects before we create compute instances and during the project provisioning, the network setup is created.
OK, so how can you solve this. Well, we can get the information about the already existing network configuration. But be careful! The module names changed between Ansible 2.8 and ansible 2.9. For example, gcp_compute_instance_facts
in 2.8
was renamed to gcp_compute_instance_info
in 2.9
. Why the hell? Normally, we are talking about facts
because it is the common naming for variables which are containing any kind of operating system provided facts but now we use info
. Yes, sure why not? 🙄🧐
In the following listing of the Playbook, you can ingnore the debug
blocks. You will of course notice some cracy statements like network: "{{ network['items'][0] }}"
on lines 67⁄68 - this is because, the network
yaml parameter of the gcp_compute_instance
module will only take exactly ONE object but the gcp_compute_network_fact
module returns a list 😶 - so, you have to access the first element of the returning array. Be careful, the module returns an object called items
this can have some impacts if you use loops in Ansible though!
Finally, here is the whole play to create an GCP compute instance with existing network and Ansible 2.8:
- hosts: localhost
connection: local
gather_facts: no
vars:
type: gitlab-runner
region: europe-west3
zone: europe-west3-a
gcp_instance_name: "{{ nodename }}"
gcp_project: a-test-project
gcp_cred_kind: serviceaccount
gcp_cred_file: /somewhere/serviceaccount.json
gcp_network_vpc: hub-private-vpc
gcp_network_subnetwork_vpc: private-subnet
tasks:
- name: get info on a network
gcp_compute_network_facts:
filters:
- name = "{{ gcp_network_vpc }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: network
- name: debug
debug:
var: network['items'][0]
- name: get info on a subnet-network
gcp_compute_subnetwork_facts:
filters:
- name = "{{ gcp_network_subnetwork_vpc }}"
project: "{{ gcp_project }}"
region: "{{ region }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
register: subnetwork
- name: debug
debug:
var: subnetwork['items'][0]
- name: create a disk
gcp_compute_disk:
name: "{{ gcp_instance_name }}-disk"
size_gb: 50
source_image: projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
register: disk
- name: create a instance
gcp_compute_instance:
name: "{{ gcp_instance_name }}-instance"
machine_type: n1-standard-1
scheduling:
preemptible: 'true'
disks:
- auto_delete: 'true'
boot: 'true'
source: "{{ disk }}"
metadata:
type: 'gitlab-runner'
network_interfaces:
- network: "{{ network['items'][0] }}"
subnetwork: "{{ subnetwork['items'][0] }}"
access_configs:
- name: External NAT
type: ONE_TO_ONE_NAT
zone: "{{ zone }}"
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
state: present
This is the first blog post of a series. Needless to say, that you should ever keep you keyfiles and credentinals save! Use Ansible Vault or some other methods to protect them. It might take some time until the next blog post on this topic, but stay tuned. If you have questsion, reach out for us on Twitter!
Icons made by itim2101 from www.flaticon.com.
Mario