Skip to content

agileactors/integration-testing-airflow

Repository files navigation

Airflow integration testing

Background

This code is for tutorial purposes of the Agile Actors Data Chapter. It continues in a more advanced manner from Basic integration Testing in the spirirt of Testing in Airflow Part 2 — Integration Tests and End-To-End Pipeline Tests

In summary the project sets up a PostgreSQL database where financial transactions are inserted. Airflow scans the table and when it finds new transactions, it ingests the transactions in a MinIO store (like S3). Then it marks in another PostgreSQL database that the task has been accomplished. It is there for preventing double ingestions.

For convenient developing in PyCharm setup interpreter paths for plugins and dags. Install also

pip install -r requirements-docker.txt

for autocompletion.

*** NOTE *** : Code has been tested by both Docker Desktop and colima in MacOS. So FOSS friends should have an enjoyful experience.

General testing

First install dependencies

pip install -r requirements-dev.txt

and run

  tox

Integration testing

Tested with python 3.11.3.

First install dependencies

pip install -r requirements-dev.txt

and then build "good" image.

docker-compose build

Now testing is as easy as executing

 docker-compose down -v
 rm -rf logs
 pytest -vvv -s --log-cli-level=DEBUG tests/integration/test_ingestions.py

The integration tests (not complete)

  • create a test MinIO buffer
  • create a test database with test user for financial transactions
  • create a test database with test user for ingestions
  • populate with test data, execute airflow DAG and run assertions

Manual testing

Here we run manually the steps leading to the integration tests. For maintaining state uncomment the sections in docker compose for volumes. Please comment them again before running the integration tests.

Start the deployment by running

docker-compose up

Now you need to create users, tables and buffer.

Create buffer

Connect to MinIO and create a buffer named bucket. This is the name that appears in airflow variable. Credentials are in the docker-compose.yaml. Leave it open to see the ingestions.

Create users and tables.

Financial transactions

Connect with DBeaver to the Postgresql with credentials in the corresponding docker-compose.yaml section.

CREATE DATABASE financedb;

(of course we could have inserted everything in docker compose like here) Now you can run an SQL script in order to set up the financial database.

When you are finished

Run an SQL script in order to tear down the financial database.

Remove it by running

DROP DATABASE financedb;

Ingestions database

First we need a database

CREATE DATABASE ingestiondb;

which we can dispose if we do not need it

DROP DATABASE ingestiondb;

Here alembic covers you

alembic upgrade head

to nuke it when you're finished

alembic downgrade base

Run the dag

Visit the airflow server and activate your dag. Extra info in official Airflow Documentation. Airflow Apache Project.

Have fun!

About

This is a Python repository for the Data Chapter

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •