CI/CD pipeline with Google Compute Engine and GitHub Actions, part II

In the previous blog post, I reviewed how CI/CD could keep the team in sync and save time by automatically merging code changes. Today I will look at the value of failing fast or the second claim that I made in the original post, “A CI/CD pipeline gives developers the power to fail fast and recover even faster…”.

In this post, I will expand on the previously developed GitHub Actions to automate the testing and deployment of Airflow DAGs to Compute Engine on GCP.

How does our CI/CD pull workflow work?

  • A data engineer creates a pull request in GitHub.
  • A CI GitHub Actions workflow, triggered by the pull request, retrieves the code changes.
  • Because Python is a flexible language in terms of formatting, it is often recommended to convert your Python code to be PEP8-compliant. PEP8 is the standard formatting conventions guide for Python code that sets out rules for, for example, line length, identification, multi-line expressions, naming conventions, etc. To check your Python code for PEP8 violations, you could use different libraries, for example, Pylint, Pyright, Flake8, etc.

In our scenario, when the code is submitted for review, it is automatically checked against the PEP8 formatting requirements using flake8.

  • The reviewer merges the feature branch with the master branch. Updated/new code is pulled automatically to the Virtual Machine.

Prerequisites:

  • A Google Cloud Platform account >> Google Compute Engine service;
  • GitHub account. The structure of my repository is as follows: the folder ‘tests’ >> ‘tests.py’ contains a custom test to check for the DAG import errors; the folder ‘current_dags’ is the directory with the latest Python files containing DAG definitions.
  • dags/
    ├─ tests/
    │ ├─ tests.py
    ├─ current_dags/
    ├─ requirements.txt

Steps:

GitHub:

(1) Please review the steps taken in CI/CD pipeline with Google Compute Engine and GitHub Actions, part I to create the workflow that will pull the changes to your GCP Compute Engine. Let’s just make a few changes to the workflow by (1) changing the event from ‘pull requests on master branch to be opened’ to ‘workflow Test DAGs to be completed’ and (2) renaming the workflow to ‘sync_dags’.

(2) Create the second workflow ‘test_dags.yml’ and name it as ‘Test DAGs’.

The workflow contains the following steps:
(1) Define a Python version for the workflow runner — in our case, ‘3.8’.
(2) Install the project dependencies:
* upgrade the Python package installer pip
* look for a requirements.txt file and install dependencies — flake8; and markupsafe
(3) Run flake8 to catch particular linting errors. I’ve selected particular errors to test using — select and specified files & folders that are not to be checked by flake8 for syntax using — exclude= — more on the command syntax here.

Linting errors tested:

E111 rule or code indentation is not a multiple of four:

def get_name(self):
if self.first_name and self.last_name:
return self.first_name + ' ' + self.last_name
else:
return self.last_name

E116 rule or unexpected indentation in comments:

    # 'httpd/unix-directory'
mimetype = 'application/x-directory'

E125 rule or unexpected indentation in the ‘if’ loop:

if user is not None and user.is_admin or \
user.name == 'Grant':
blah = 'yeahnah'

E127 rule or not indenting to the opening bracket:

print("Python", ("Hello",
"World"))

E129 rule or having the line indented as the next logical line:

if (row < 0 or module_count <= row or
col < 0 or module_count <= col):
raise Exception("%s,%s - %s" % (row, col, self.moduleCount))

F401 rule or having multiple imports in the same line:

import collections, os, sys

F841 rule or defining a local variable in a function but never using it.

To catch these non-compliant errors you could earlier. If you use vstudio, you can install flake8 the extension — more here.

There could be a myriad of tests that can be added to your GitHub Actions workflows, such as DAG Import Error testing, connections testing, etc. The above test will allow tackling the simplest use case of flagging syntax errors such as missing parentheses, unmatched quotes, and incorrect indentation, and then, synchronizing the updated code with the production Virtual Machine upon the branch being merged.

References:

https://programmaticponderings.com/2021/12/14/devops-for-dataops-building-a-ci-cd-pipeline-for-apache-airflow-dags/

--

--